Fix atomic rename for INTO OUTFILE TRUNCATE not preserving data on error by pamarcos · Pull Request #101884 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix atomic rename for INTO OUTFILE TRUNCATE not preserving data on error#101884

Merged
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
pamarcos:fix-outfile-truncate-atomic-rename
May 8, 2026
Merged

Fix atomic rename for INTO OUTFILE TRUNCATE not preserving data on error#101884
alexey-milovidov merged 5 commits into
ClickHouse:masterfrom
pamarcos:fix-outfile-truncate-atomic-rename

Conversation

@pamarcos

@pamarcos pamarcos commented Apr 6, 2026

Copy link
Copy Markdown
Member

The atomic rename feature introduced in #77181 for INTO OUTFILE ... TRUNCATE produced correct results on the happy path, but the atomicity guarantee was broken — on query failure, the original file content was destroyed.

The happy path worked by accident: initOutputFormat read the file path from the AST (which was never updated with the temp path), so it wrote directly to the original file with O_TRUNC. Then performAtomicRename renamed the file to itself (a no-op). The result was correct, just not atomic.

Two bugs made the atomicity non-functional:

  1. processOrdinaryQuery generated a UUID-based temp file path in a local variable but never propagated it to the AST literal. Since initOutputFormat reads the path from the AST, it opened the original file with O_TRUNC, destroying its content immediately instead of writing to the temp file.

  2. performAtomicRename was called unconditionally after receiveResult, even when the server returned an error. receiveResult does not throw on server exceptions — it sets have_error and returns normally. So once bug 1 is fixed, the temp file with partial results would still be renamed over the original on failure.

The fix updates the AST literal with the temp path after generating it (so initOutputFormat writes to the temp file), and guards the atomic rename with a have_error check, cleaning up the temp file on error instead of renaming it.

Two new test cases are added to the existing 03362_into_outfile_atomic_truncate test:

  • Test 7 verifies actual atomicity: runs a slow query in the background and confirms the original file is untouched during execution while a temp file exists.
  • Test 8 verifies error preservation: a query that fails mid-execution does not destroy the original file content.

Fixes #101726

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry:

Fix INTO OUTFILE ... TRUNCATE not actually using atomic rename — the write went directly to the original file instead of a temp file, so on query failure the original content was destroyed. Now the data is written to a temp file and renamed only on success.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.450

The atomic rename feature for `INTO OUTFILE ... TRUNCATE` produced correct
results on the happy path, but the atomicity guarantee was broken — on
query failure, the original file content was already destroyed.

The happy path worked by accident: `initOutputFormat` read the file path
from the AST (which was never updated with the temp path), so it wrote
directly to the original file with `O_TRUNC`. Then `performAtomicRename`
renamed the file to itself (a no-op). The result was correct, just not
atomic.

Two bugs made the atomicity non-functional:

1. The temp file path generated in `processOrdinaryQuery` was stored in a
   local variable but never propagated to the AST literal. Since
   `initOutputFormat` reads the path from the AST, it opened the original
   file with `O_TRUNC`, destroying its content immediately instead of
   writing to the temp file.

2. `performAtomicRename` was called unconditionally after `receiveResult`,
   even when the server returned an error. Since `receiveResult` does not
   throw on server exceptions (it sets `have_error` instead), once bug 1
   is fixed the temp file with partial results would be renamed over the
   original on failure.

Fix by updating the AST literal with the temp path after generating it,
and by guarding the atomic rename with a `have_error` check (cleaning up
the temp file on error instead).

ClickHouse#101726
@clickhouse-gh

clickhouse-gh Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 6, 2026
The temp file path is generated in the same parent directory as the target
(required for `rename(2)` to work on the same filesystem). For special
files like `/dev/null`, this produces an unwritable path like
`/dev/tmp_<uuid>.null`. There is nothing to preserve atomically for
non-regular files anyway, so skip the temp-file-and-rename logic entirely
when the target is not a regular file.
@alexey-milovidov

Copy link
Copy Markdown
Member

The flaky test failures in this PR (01605_drop_settings_profile_while_assigned, 01418_custom_settings, 02015_async_inserts_4) are caused by non-unique access entity names and will be resolved by #102131.

fm4v added a commit that referenced this pull request Apr 14, 2026
Split flaky check and targeted check into separate roles:

Flaky check — runs only new/modified tests from the PR with thread
fuzzer enabled. Tests run in parallel with themselves (shared queue)
to ensure new tests are parallel-safe under randomized thread ordering.

Targeted check — runs previously failed and coverage-relevant tests
with per-worker partitioning (--no-self-parallel) so the same test
never runs concurrently on multiple workers, avoiding conflicts on
shared resources (hardcoded users, roles, settings profiles, etc.).
Both use --test-runs 50.

Changes:
- Add --no-self-parallel flag to clickhouse-test: partitions tests
  by name hash so all copies go to the same worker
- Flaky check: only changed tests, thread fuzzer, shared queue
- Targeted check: relevant tests, no thread fuzzer, per-worker queues,
  50 runs, single clickhouse-test invocation
- Restore original thread fuzzer parameters (reverts reductions from
  06c945a and 9790435)
- Add amd_binary to flaky check, use AMD_LARGE for targeted debug
- Targeted check uses --flaky-check --no-self-parallel

PRs affected by tests running in parallel with themselves:
#102015 #101093 #99653 #101503 #101108 #101884 #100746
#101447 #100941 #101099 #100203 #96130

#102038
fm4v added a commit that referenced this pull request Apr 14, 2026
Split flaky check and targeted check into separate roles:

Flaky check — runs only new/modified tests from the PR with thread
fuzzer enabled. Tests can run in parallel with themselves (shared
queue) to ensure new tests are parallel-safe.

Targeted check — runs previously failed and coverage-relevant tests
with per-worker partitioning (--no-self-parallel) so the same test
never runs concurrently on multiple workers, avoiding conflicts on
shared resources (hardcoded users, roles, settings profiles, etc.).
Both use --test-runs 50.

Changes:
- Add --no-self-parallel flag to clickhouse-test: partitions tests
  by name hash so all copies go to the same worker
- Flaky check: only changed tests, thread fuzzer, shared queue
- Targeted check: relevant tests, no thread fuzzer, per-worker
  partitioning, 50 runs, single clickhouse-test invocation
- Restore original thread fuzzer parameters
- Change targeted check to arm_asan_ubsan on ARM_LARGE

PRs affected by tests running in parallel with themselves:
#102015 #101093 #99653 #101503 #101108 #101884 #100746
#101447 #100941 #101099 #100203 #96130
@pamarcos

Copy link
Copy Markdown
Member Author

The unit test failure is tracked here, unrelated to this PR: #102542

@clickhouse-gh

clickhouse-gh Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 81.82% (18/22) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 8, 2026
@alexey-milovidov alexey-milovidov self-assigned this May 8, 2026
Merged via the queue into ClickHouse:master with commit fd077f9 May 8, 2026
165 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Atomic rename for INTO OUTFILE TRUNCATE is broken: temp file path never reaches initOutputFormat

3 participants