iframe-proxy

groeneai · 2026-06-16T12:30:56Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a FileLog server crash (LOGICAL_ERROR: Last stored last_written_position in meta file ... is bigger than current last_written_pos) that could happen when a watched file was deleted and recreated reusing the same inode.

Description

Symptom

StorageFileLog could abort the server with a LOGICAL_ERROR:

Last stored last_written_position in meta file a.csv is bigger than current last_written_pos (N > 0)

Crash path: StorageFileLog::checkOffsetIsValid -> serialize -> openFilesAndSetPos.

Root cause

When a watched file is deleted and recreated under the same name within one
directory-watcher backoff window, the DW_ITEM_REMOVED and DW_ITEM_ADDED
events are drained together. updateFileInfos() processes DW_ITEM_ADDED
first: it flips the file status back to OPEN (so the later REMOVED-cleanup
loop skips it) and resets the in-memory FileMeta for the inode to offset 0.

onFileAppeared() only removes the stale on-disk meta when the inode
changes. If the recreated file reuses the same inode (deterministic on
filesystems that recycle freed inodes, e.g. ext4), that guard does not fire,
so the on-disk meta with the old, larger offset is left behind while the
in-memory offset is 0. The next serialize() reads the larger stored offset
and trips checkOffsetIsValid.

PR #101408 covered the inode-change case (a fresh inode). This PR covers
inode reuse, which is what surfaced under stress in CI.

Fix

A freshly added file is read from offset 0, so any on-disk meta under its
name is stale. Remove it in the DW_ITEM_ADDED branch so the on-disk and
in-memory metadata stay consistent. The DW_ITEM_MOVED_TO (rename) branch is
intentionally left unchanged: it preserves the consumed offset, and a
moved-out name's on-disk meta is already removed by the REMOVED-cleanup loop
before the new inode is observed.

Validation

New regression test 04342_file_log_delete_recreate_same_inode reproduces
the crash on an unfixed server (it aborts on the first delete+recreate round)
and passes after the fix. Existing FileLog regression tests
(04242_file_log_delete_recreate, 04257_file_log_rename_and_recreate) and
the FileLog streaming suite continue to pass, and after delete+recreate the
new content is still streamed correctly (no rows skipped).

No related open issue was found.

Version info

Merged into: 26.7.1.58

StorageFileLog could abort with a LOGICAL_ERROR when a watched file was deleted and recreated reusing the same inode within one directory-watcher backoff window: Last stored last_written_position in meta file a.csv is bigger than current last_written_pos (N > 0) When the DW_ITEM_REMOVED and DW_ITEM_ADDED events for the same name are drained together, updateFileInfos() processes DW_ITEM_ADDED first: it flips the file status back to OPEN (so the later REMOVED-cleanup loop skips it) and resets the in-memory FileMeta for the inode to offset 0. onFileAppeared() only cleans up the on-disk meta when the inode CHANGES, so on inode REUSE the stale on-disk meta (with the old, larger offset) is left behind. The next serialize() then reads that larger stored offset and trips checkOffsetIsValid(). PR ClickHouse#101408 covered the inode-change case (a fresh inode). This covers inode reuse, which is deterministic on filesystems that recycle freed inodes (e.g. ext4) and is what surfaced under stress in CI. A freshly added file is read from offset 0, so any on-disk meta under its name is stale; remove it in the DW_ITEM_ADDED branch so the on-disk and in-memory metadata stay consistent. The DW_ITEM_MOVED_TO (rename) branch is intentionally left unchanged: it preserves the consumed offset, and the moved-out name's on-disk meta is already removed by the REMOVED-cleanup loop before the new inode is seen. Regression test 04342_file_log_delete_recreate_same_inode reproduces the crash on an unfixed server (delete + immediate recreate, high watch backoff) and passes after the fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-16T12:31:23Z

Pre-PR validation gate

#	Question	Answer
a	Deterministic repro?	Yes. FileLog table on an ext4 dir (ext4 deterministically reuses freed inodes; tmpfs/xfs do not), `poll_directory_watch_events_backoff_init=max=5000` so REMOVED+ADDED batch together, then `rm a.csv` + immediate recreate of `a.csv`. The unfixed server aborts (here on round 2) with `Last stored last_written_position in meta file a.csv is bigger than current last_written_pos (6 > 0)`. The watcher trace confirms `onItemRemoved a.csv` + `onItemAdded a.csv` drained in the same batch, and the recreated file reused the original inode.
b	Root cause explained?	When the REMOVED and ADDED events for the same name are drained together, `updateFileInfos()` handles `DW_ITEM_ADDED` first: it flips the file status back to `OPEN` (so the later REMOVED-cleanup loop skips it) and resets the in-memory `FileMeta` for the inode to offset 0. `onFileAppeared()` only removes the stale on-disk meta when the inode CHANGES; on inode REUSE that guard is false, so the on-disk meta keeps the old larger offset while the in-memory offset is 0. The next `serialize()` reads the larger stored offset -> `checkOffsetIsValid` raises LOGICAL_ERROR. PR #101408 covered the inode-change case only.
c	Fix matches root cause?	Yes. A freshly added file is read from offset 0, so any on-disk meta under its name is stale. The fix removes that on-disk meta in the `DW_ITEM_ADDED` branch (`disk->removeFileIfExists(getFullMetaPath(file_name))`) right where the in-memory meta is reset to 0, restoring the on-disk/in-memory invariant `serialize()` relies on. This targets the exact mechanism in (b), not a symptom (no assertion widening, no `no-random-*` tag).
d	Test intent preserved / new tests added?	New regression test `04342_file_log_delete_recreate_same_inode.sh` added (delete + immediate recreate over several rounds under high watch backoff). Existing `04242`/`04257` and the FileLog streaming suite still pass, and a delete+recreate still streams the new content correctly (`[1,2,3,100,200]`, nothing skipped).
e	Both directions demonstrated?	Yes. New test FAILS on the unfixed binary (Build ID 106eddda...): server crashes on round 1 with the `(6 > 0)` LOGICAL_ERROR. New test PASSES on the fixed binary (Build ID 2d56c397...): survives all rounds. Build IDs verified to differ and to match the running server logs.
f	Fix is general, not a narrow patch?	The other zero-offset reset site, the `DW_ITEM_MOVED_TO` else branch, was probed against the fixed binary (move-out + move-in with a new inode, 30 rounds): no crash. It is structurally safe because a moved-out name's on-disk meta is already removed by the REMOVED-cleanup loop before the new inode is observed, and `MOVED_TO` for a same-inode rename preserves the offset by design. The fix is scoped to the one branch (`DW_ITEM_ADDED`) where the invariant is actually broken.

Session id: cron:clickhouse-worker-slot-5:20260616-114000

groeneai · 2026-06-16T12:31:42Z

cc @antaljanosbenjamin @kssenii — could you review this? It fixes a FileLog server crash (LOGICAL_ERROR: ... last_written_position ... bigger than current (N > 0)) on delete+recreate that reuses the same inode within one watcher backoff window: the DW_ITEM_ADDED branch resets the in-memory offset to 0 but left the stale on-disk meta behind (onFileAppeared only cleans up on inode change, not inode reuse). Extends PR #101408 to the inode-reuse case; regression test included.

clickhouse-gh · 2026-06-16T17:09:51Z

Workflow [PR], commit [963e74b]

Summary: ❌

job_name	test_name	status	info	comment
BuzzHouse (amd_tsan)		ERROR		IGNORED

AI Review

Summary

The PR fixes FileLog stale metadata when a watched file is deleted and recreated with the same inode: the DW_ITEM_ADDED path now drops the stale on-disk meta before resetting the in-memory FileMeta to offset 0. The current regression test covers both required conditions, same-inode reuse and same updateFileInfos drain, and also verifies that recreated rows are streamed. I did not find unresolved correctness issues in the current diff.

Final Verdict

Status: ✅ Approve

The test relied on ext4 reusing the just-freed inode, so on tmpfs/xfs an unfixed build took the already-covered inode-change path and could pass. Force the same-inode recreate deterministically with a hard link held outside the watched directory: pin the inode across the unlink, rewrite content through the held link, then link it back under the original name. This produces a DW_ITEM_REMOVED + DW_ITEM_ADDED batch for the watched name with an unchanged inode regardless of filesystem allocation policy. Also assert the recreated rows stream, proving FileLog both avoids the LOGICAL_ERROR and re-reads from offset 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-16T19:09:07Z

Done in b07494e. The test no longer depends on filesystem inode-allocation policy.

It now forces the same-inode recreate deterministically with a hard link held outside the watched directory: pin the inode across the unlink, rewrite the content through the held link, then link the same inode back as a.csv. That yields a DW_ITEM_REMOVED + DW_ITEM_ADDED batch for the watched name with an unchanged inode, independent of whether the filesystem would have reused the freed inode. It also waits for the recreated rows and asserts the full stream 1,2,3,100,200, proving FileLog both avoids the exception and re-reads from offset 0.

Validated on xfs (which never reuses inodes, so the old test could not have reproduced here):

Unfixed binary (build id 106eddda): aborts with Last stored last_written_position in meta file a.csv is bigger than current last_written_pos (6 > 0), stack checkOffsetIsValid -> serialize -> openFilesAndSetPos -> streamToViews. The server log confirms onItemRemoved a.csv + onItemAdded a.csv in the same watcher drain (same inode).
Fixed binary (build id 2d56c397): survives and streams 1,2,3,100,200; no LOGICAL_ERROR in the log, and the REMOVED+ADDED batch is still processed (path exercised, not avoided).

The FileLog engine is Linux-only (USE_FILELOG is gated on OS_LINUX in src/configure_config.cmake and the directory watcher uses inotify), so it is not compiled into the Darwin / fast-test build. Without no-fasttest the test runs there and CREATE TABLE ... ENGINE = FileLog fails with UNKNOWN_STORAGE, breaking Fast test (arm_darwin). Matches the existing no-fasttest FileLog tests 04031 and 04202. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The same-inode recreate was forced via a held hard link, but the DW_ITEM_REMOVED + DW_ITEM_ADDED pair was not guaranteed to reach the storage in one updateFileInfos drain: watchFunc wakes the storage task right after reading the delete event, so on some interleavings an unfixed server could process REMOVED alone (the cleanup loop erases the stale on-disk meta) and then see ADDED as a clean new file, giving a false pass. getEventsAndReset is only called while a dependent view is attached, so detach the materialized view across the delete/relink and re-attach afterwards. The watcher keeps accumulating REMOVED + ADDED while the consumer is detached and both are drained together on re-attach, independent of watchFunc poll timing. Backoff is lowered to 500ms now that batching no longer relies on a wide backoff window. Validated both directions on xfs: unfixed binary (build id 106eddda) aborts with 'last_written_position ... (6 > 0)' in checkOffsetIsValid <- serialize <- openFilesAndSetPos <- streamToViews, with onItemRemoved + onItemAdded for a.csv in one drain; fixed binary (build id 2d56c397) survives and streams 1,2,3,100,200. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-17T05:05:56Z

Done in 963e74b. The REMOVED + ADDED batching is now deterministic too, independent of watchFunc poll timing.

getEventsAndReset is only called from updateFileInfos, which only runs while a dependent view is attached. So the test now detaches file_log_mv before the delete/relink and re-attaches it afterwards: with the consumer detached the watcher keeps accumulating DW_ITEM_REMOVED + DW_ITEM_ADDED, and they are drained together by the first streaming cycle after re-attach. No streaming cycle can run between the two events regardless of how the scheduler interleaves them. Backoff is lowered to 500ms since batching no longer depends on a wide window.

Re-validated both directions on xfs (which never reuses inodes, so the hard-link trick is doing the inode-reuse work):

Unfixed (build id 106eddda): server aborts on re-attach with Last stored last_written_position in meta file a.csv is bigger than current last_written_pos (6 > 0), stack checkOffsetIsValid <- serialize <- openFilesAndSetPos <- streamToViews <- threadFunc. Log shows onItemRemoved a.csv + onItemAdded a.csv in one drain (3 us apart).
Fixed (build id 2d56c397): survives and streams 1,2,3,100,200.

clickhouse-gh · 2026-06-17T08:24:06Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.20%	85.30%	+0.10%
Functions	92.30%	92.30%	+0.00%
Branches	77.50%	77.50%	+0.00%

Changed lines: Changed C/C++ lines covered by tests: 8/8 (100.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 3 line(s) · Uncovered code

Full report · Diff report

groeneai · 2026-06-17T10:19:06Z

CI finished on 963e74b7c5ef. One job-level failure, not PR-caused:

BuzzHouse (amd_tsan): job-level ERROR, FileNotFoundError: .../status.tsv at ci/jobs/ast_fuzzer_job.py:240 (the fuzzer runner aborted before writing status.tsv). No STID, no crash stack, no ClickHouse frame; the other three BuzzHouse builds (amd_debug, amd_msan, arm_asan_ubsan) passed on this same commit. This is the chronic fuzzer-runner harness flake, addressed separately in Report fuzzer runner aborts instead of an opaque status.tsv FileNotFoundError #107676. Unrelated to this StorageFileLog change.
CH Inc sync: CIDB log-export sidecar, infrastructure, orthogonal to this PR.
Mergeable Check / PR: aggregators, red only because of the BuzzHouse job above.

All PR-own checks pass: Fast test, Fast test (arm_darwin), every Build, Integration, and the new 04342 regression test path. AI Review verdict on this commit is Approve.

antaljanosbenjamin · 2026-06-17T14:22:13Z

+                    /// under this name is stale. Drop it to stay consistent with
+                    /// the pos-0 meta set below, else serialize() sees a larger
+                    /// stored offset and raises LOGICAL_ERROR.
+                    disk->removeFileIfExists(getFullMetaPath(file_name));


@groeneai why not to remove it when the file is removed?

Removing it on DW_ITEM_REMOVED does not fire in the interleaving that triggers this crash.

Delete+recreate under the same name within one backoff window puts DW_ITEM_REMOVED and DW_ITEM_ADDED in the same getEventsAndReset() batch, walked in order:

DW_ITEM_REMOVED sets context_by_name[name].status = REMOVED.

DW_ITEM_ADDED (same reused inode) calls onFileAppeared(). The inode is unchanged, so its it->second.inode != inode guard is false (no meta cleanup), and it then resets the context to {OPEN, inode}, clearing REMOVED.

So when the REMOVED-cleanup loop runs, status is OPEN; its if (status == REMOVED) branch (with the removeFileIfExists) is skipped, the stale on-disk meta survives, and the ADDED branch has meanwhile reset the in-memory meta to pos 0. The next serialize() sees on-disk 122 vs in-memory 0 and raises the LOGICAL_ERROR.

DW_ITEM_ADDED is where the invariant actually breaks (the file is reassigned offset 0), so it is the right place to drop the now-stale meta. It mirrors the cleanup onFileAppeared already does for the inode-change case.

Eagerly deleting the meta in the REMOVED handler instead would break renames: DW_ITEM_MOVED_FROM also sets status = REMOVED, and there the meta must survive so DW_ITEM_MOVED_TO can rename it and preserve the offset.

If the file shows up with same inode, then onFileAppeared won't clean up the file metas. Shouldn't they cleaned up here too?

onFileAppeared's cleanup is specific to the inode-CHANGE case: it erases the meta of the name's previous inode (it->second.inode). That has to live in onFileAppeared, because line 931 overwrites it->second with the new inode right after, so this is the last point the orphaned old-inode entry is still identifiable.

The same-inode case is a different operation. There is no orphaned old inode to erase: the same meta_by_inode[inode] entry is reused and must be reset to offset 0 with its on-disk meta dropped. That reset-to-0 is correct only for DW_ITEM_ADDED (a genuinely new file). onFileAppeared is also called from DW_ITEM_MOVED_TO (line 1037), where the offset must be preserved: lines 1040-1046 rename the meta and keep last_writen_position. Cleaning the meta on same-inode inside onFileAppeared would wipe that; a rename onto an existing same-inode name would then fall through to the else at line 1048 and recreate the meta at offset 0, losing the moved file's position.

So the same-inode drop stays in the DW_ITEM_ADDED branch, next to the pos-0 reset (lines 1001-1004) it must stay consistent with. onFileAppeared keeps only the inode-change erase, which depends on reading the old inode before it is overwritten.

groeneai mentioned this pull request Jun 16, 2026

Clear async load tasks on detach to fix self-suggesting table hint #106238

Merged

1 task

PedroTadim added the can be tested Allows running workflows for external contributors label Jun 16, 2026

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 16, 2026

clickhouse-gh Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04342_file_log_delete_recreate_same_inode.sh Outdated

clickhouse-gh Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04342_file_log_delete_recreate_same_inode.sh

antaljanosbenjamin reviewed Jun 17, 2026

View reviewed changes

antaljanosbenjamin self-assigned this Jun 24, 2026

antaljanosbenjamin approved these changes Jun 24, 2026

View reviewed changes

antaljanosbenjamin added this pull request to the merge queue Jun 25, 2026

Merged via the queue into ClickHouse:master with commit 0d9996b Jun 25, 2026
163 of 165 checks passed

robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 25, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

groeneai commented Jun 16, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Description

Symptom

Root cause

Fix

Validation

Version info

Uh oh!

groeneai commented Jun 16, 2026

Pre-PR validation gate

Uh oh!

groeneai commented Jun 16, 2026

Uh oh!

clickhouse-gh Bot commented Jun 16, 2026 • edited by antaljanosbenjamin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Final Verdict

Uh oh!

Uh oh!

groeneai commented Jun 16, 2026

Uh oh!

Uh oh!

groeneai commented Jun 17, 2026

Uh oh!

clickhouse-gh Bot commented Jun 17, 2026

LLVM Coverage Report

Uh oh!

groeneai commented Jun 17, 2026

Uh oh!

antaljanosbenjamin Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

groeneai Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

antaljanosbenjamin Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

groeneai Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

groeneai commented Jun 16, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Jun 16, 2026 •

edited by antaljanosbenjamin

Loading

antaljanosbenjamin Jun 17, 2026 •

edited

Loading