iframe-proxy

groeneai · 2026-05-16T04:57:42Z

Motivation

Issue #104900: a Refreshable Materialized View with REFRESH ... running on
a server with a finite max_table_size_to_drop (default 50 GB) leaks one
temporary inner table per refresh once the rotated-out target exceeds the
limit. Disk usage grows without bound until the server runs out of space.

Root cause

Each refresh of a MATERIALIZED VIEW ... REFRESH ... (non-APPEND) creates a
fresh inner table and atomically swaps it with the existing target via
exchangeTargetTable. The previous target then carries the
.tmp.inner_id.<uuid> name and is dropped by dropTempTable
(src/Storages/StorageMaterializedView.cpp). That drop goes through the
regular InterpreterDropQuery path with the refresh context, so
MergeTreeData::checkTableCanBeDropped enforces
max_table_size_to_drop / max_partition_size_to_drop against the
rotated-out table.

When the table exceeds the limit, the drop throws
TABLE_SIZE_EXCEEDS_MAX_DROP_SIZE_LIMIT. dropTempTable catches the
exception and logs it, but the refresh still reports success — and the
temporary inner is left behind. The next refresh creates yet another fresh
inner and tries to drop the previous one, which fails the same way. The
view's data directory grows without bound.

Reproduction stack (from the issue):

4. Context::checkCanBeDropped(...)
5. MergeTreeData::checkTableCanBeDropped(...)
6. InterpreterDropQuery::executeToTableImpl(...)
7. InterpreterDropQuery::executeSingleDropQuery(...)
8. InterpreterDropQuery::execute()
9. StorageMaterializedView::dropTempTable(StorageID, Context, String&)
10. RefreshTask::refreshTask()

Approach

The size safety nets are user-protection knobs — they exist so that a user
running an interactive DROP TABLE on a huge MergeTree gets a chance to
reconsider. The refresh task's drop is internal cleanup: the temporary table
was created by the refresh task itself moments earlier, the user is not in
control of its size, and refusing the drop just leaks disk space without
giving the user any actionable choice. The right fix is to bypass the
safety check for this one internal drop site.

dropTempTable now copies the refresh context with
Context::createCopy, pins max_table_size_to_drop = 0 and
max_partition_size_to_drop = 0 on the copy (both changed=true so they
override the server setting via the path in
MergeTreeData::checkTableCanBeDropped), and uses the copy for the
InterpreterDropQuery. Other operations in the original refresh context —
logging, dependency teardown, anything else that might run alongside —
keep their original settings.

The fix is localized to dropTempTable; the only other place
max_table_size_to_drop is checked for a refreshable MV is
dropInnerTableIfAny, called when the user explicitly DROPs the view,
which correctly continues to honour the user's policy.

Regression test

tests/queries/0_stateless/04247_104900_refreshable_mv_tmp_table_leak.sh

The test creates a 1000-row MergeTree source and a Refreshable MV whose
SELECT carries SETTINGS max_table_size_to_drop = 1. The setting is
propagated to the refresh context by applySettingsFromQuery in
prepareRefresh, so without the fix the second refresh's rotated-out target
fails to drop (size ≫ 1 byte). The test then asserts that no
.tmp.inner_id.* tables remain after two successive refreshes.

Local verification on this branch:

--no-random-settings --no-random-merge-tree-settings --test-runs 20: 20/20 pass with fix.
Default random settings, --test-runs 20: 20/20 pass with fix.
Reverting only src/Storages/StorageMaterializedView.cpp and rebuilding: test fails (1 leaked tmp inner instead of 0).

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a leak in Refreshable Materialized Views (MATERIALIZED VIEW ... REFRESH ...) where the temporary inner table that the refresh task rotates out via EXCHANGE TABLES could not be dropped if it was larger than the server's max_table_size_to_drop. Every subsequent refresh would create yet another temporary inner table while still failing to clean up the previous ones, growing the view's data directory until the disk was exhausted. The refresh task's internal drop now bypasses max_table_size_to_drop and max_partition_size_to_drop; those safety nets still apply to user-issued DROP TABLE of the view itself. Closes #104900.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

…is small When the refresh task of a `MATERIALIZED VIEW ... REFRESH ...` rotated its target table out via `exchangeTargetTable`, the previous target became a temporary inner table named `.tmp.inner_id.<uuid>` that the same refresh task then tried to drop in `StorageMaterializedView::dropTempTable`. The drop went through the regular `InterpreterDropQuery` path with the refresh context, so `MergeTreeData::checkTableCanBeDropped` honoured the server-wide `max_table_size_to_drop` / `max_partition_size_to_drop` safety nets. When the rotated-out table was larger than those limits, the drop threw `TABLE_SIZE_EXCEEDS_MAX_DROP_SIZE_LIMIT`, the temporary inner was leaked, and every subsequent refresh repeated the pattern — creating yet another fresh inner while still failing to drop the accumulated ones. The view's data directory grew until the disk was exhausted (issue ClickHouse#104900). Those safety nets exist to protect users from accidentally `DROP`ping large tables; they should not apply to the refresh task's own cleanup, where the temporary table was created by the refresh itself and the user is not in control of its size. Use a copy of the refresh context with both settings pinned to 0 for the internal drop only — other operations in the refresh context keep their original settings. Adds a regression test that uses `SETTINGS max_table_size_to_drop = 1` inside the MV's SELECT (propagated to the refresh context by `applySettingsFromQuery` in `prepareRefresh`) and verifies no `.tmp.inner_id.*` tables remain after two successive refreshes. Closes ClickHouse#104900

groeneai · 2026-05-16T04:58:28Z

Pre-PR validation gate (per task pipeline)

a) Deterministic repro? Yes — tests/queries/0_stateless/04247_104900_refreshable_mv_tmp_table_leak.sh deterministically returns 1 leaked .tmp.inner_id.<uuid> table without the fix and 0 with the fix. No timing dependency, no randomness needed; the bug fires on the second SYSTEM REFRESH VIEW.

b) Root cause explained? Yes — the refresh task's dropTempTable runs InterpreterDropQuery on refresh_context, which inherits max_table_size_to_drop from the MV's SELECT ... SETTINGS ... via applySettingsFromQuery in prepareRefresh. MergeTreeData::checkTableCanBeDropped sees the query-level setting marked changed and enforces it against getTotalActiveSizeInBytes() of the rotated-out target. Throws TABLE_SIZE_EXCEEDS_MAX_DROP_SIZE_LIMIT → leak.

c) Fix matches root cause? Yes — copy refresh_context, pin max_table_size_to_drop = 0 and max_partition_size_to_drop = 0 on the copy, use the copy for this one internal drop. The bypass is targeted at the exact safety check that triggers the leak, scoped to refresh-internal cleanup only, and leaves the user's policy in force for explicit DROP TABLE of the view itself.

d) Test intent preserved / new test added? Yes — added 04247_104900_refreshable_mv_tmp_table_leak.sh (43 lines + 2-line reference). The test exercises the exact issue-#104900 scenario (small max_table_size_to_drop + two successive refreshes) and asserts countIf(name LIKE '.tmp.%') = 0 along with the live target's row count.

e) Both directions demonstrated? Yes — verified locally on this branch:

with fix, --no-random-settings --no-random-merge-tree-settings --test-runs 20: 20/20 passes
with fix, default randomized settings, --test-runs 20: 20/20 passes
revert only src/Storages/StorageMaterializedView.cpp and rebuild: test fails with result differs (+1 instead of 0 leaked tmp inner)

f) Fix is general, not a narrow patch?

Covers both max_table_size_to_drop (DROP TABLE check) and max_partition_size_to_drop (DROP PARTITION check) — the partition variant is not exercised in the current dropTempTable (it issues a DROP TABLE only), but the unified bypass on the context copy guards against future evolution of the drop path picking up the partition check via the same mechanism.
I grepped src/Storages/StorageMaterializedView.cpp and src/Storages/MaterializedView/RefreshTask.cpp for every drop call sites: dropTempTable is the only drop the refresh task performs. dropInnerTableIfAny (user-issued DROP TABLE v) and the implicit drop in StorageMaterializedView::drop() continue to use the user's context — they correctly remain subject to the user's safety policy.
The fix is at the source of the bad state, not a defensive guard somewhere downstream. We are not silencing a symptom; the refresh-internal drop simply should never have been subject to a user-protection knob in the first place.

Session: cron:clickhouse-ci-task-worker:20260516-043000

groeneai · 2026-05-16T04:58:36Z

cc @al13n321 @tuanpach @PedroTadim — could you review this? It fixes the Refreshable MV temp-table leak you flagged in #104900. The refresh task's own dropTempTable was honouring max_table_size_to_drop against the rotated-out target, leaking one .tmp.inner_id.<uuid> per refresh once the target exceeded the limit; the fix bypasses that safety check for the refresh-internal drop only.

clickhouse-gh · 2026-05-17T23:57:11Z

Workflow [PR], commit [76673cc]

Summary: ❌

Performance Comparison: Performance dashboard

job_name	test_name	status	info
Stress test (arm_tsan)		FAIL
	Hung check failed, possible deadlock found	FAIL	cidb, issue
Stress test (amd_tsan)		ERROR

AI Review

Summary

This PR fixes the refreshable materialized view cleanup path for max_table_size_to_drop / max_partition_size_to_drop: internal temp-table cleanup now bypasses those user-protection limits for refresh-owned tables while keeping the replicated DDL metadata on the original refresh context. I found no remaining correctness issues in the current diff. The prior clickhouse-gh thread about Context::createCopy losing refresh DDL metadata is addressed by commit 76673cc1ad25f88dc765a61e6a12e9249afe6b83 and has been resolved.

Final Verdict

Status: ✅ Approve

tuanpach · 2026-05-18T02:18:32Z

@groeneai Some CI jobs failed. Please check them.

groeneai · 2026-05-18T03:34:15Z

Hi @tuanpach — I checked all 6 failing CI jobs. They are not caused by this PR. Detailed analysis below.

Summary of failed jobs

All 6 failures come from the same 2 tests:

test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db (5 of 6 jobs)
test_refreshable_mat_view_replicated/test.py::test_long_query_cancel (2 of 6 jobs)

Failing check	Failing test
`Integration tests (amd_llvm_coverage, 3/5)`	`test_long_query_cancel`
`Integration tests (amd_llvm_coverage, 5/5)`	`test_refreshable_mv_in_replicated_db`
`Integration tests (amd_msan, 5/6)`	`test_refreshable_mv_in_replicated_db`
`Integration tests (amd_tsan, 5/6)`	`test_refreshable_mv_in_replicated_db`
`Integration tests (arm_binary, distributed plan, 2/4)`	`test_long_query_cancel`
`Integration tests (arm_binary, distributed plan, 3/4)`	`test_refreshable_mv_in_replicated_db`

Failure signature

Both tests fail in the same way: a user-issued DROP TABLE times out waiting for Replicated DDL coordination across nodes, never in any code touched by this PR.

test_refreshable_mv_in_replicated_db (line 208):

drop table re.f

Error: ReplicatedDatabase DDL task /test/re_X/log/query-NNN is not finished on 1 of 2 hosts (1 of them are currently executing the task, 0 are inactive). Was waiting for 180.6 seconds, which is longer than distributed_ddl_task_timeout. (TIMEOUT_EXCEEDED)

test_long_query_cancel (line 223, in fn_setup_tables):

DROP TABLE IF EXISTS test_rmv ON CLUSTER default

Error: Timeout exceeded while receiving data from server. Waited for 300 seconds, timeout is 300 seconds.

Both errors are Keeper-coordinated DDL timeouts under slow CI runners (coverage / msan / tsan builds are 3–20× slower than normal). Neither error stack passes through StorageMaterializedView::dropTempTable, which is what this PR touches.

Why this PR cannot be the cause

This PR modifies a single internal helper:

void StorageMaterializedView::dropTempTable(StorageID table_id, ContextMutablePtr refresh_context, String & out_exception)

dropTempTable is called from exactly two sites, both in RefreshTask::executeRefreshUnlocked():

$ grep -rn "dropTempTable" src/
src/Storages/MaterializedView/RefreshTask.cpp:995:  view->dropTempTable(table_to_drop.value(), refresh_context, discard_error_message);
src/Storages/MaterializedView/RefreshTask.cpp:1020: view->dropTempTable(table_to_drop.value(), refresh_context, out_error_message);
src/Storages/StorageMaterializedView.h:150:        void dropTempTable(...)
src/Storages/StorageMaterializedView.cpp:675:      void StorageMaterializedView::dropTempTable(...)

It is never reachable from the user-facing DROP TABLE path. The user DROPs in the failing tests do not enter this function at all — they go through the standard InterpreterDropQuery → IDatabase::dropTable → StorageMaterializedView::drop chain, which is unchanged by this PR.

CIDB confirms the tests are chronically flaky on these exact jobs

Master is clean — these tests have 0 failures on master in the past 30 days:

SELECT toStartOfDay(check_start_time) as day,
       countIf(pull_request_number = 0) AS master_failures,
       countIf(pull_request_number > 0) AS pr_failures
FROM default.checks
WHERE test_name IN (
    'test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db',
    'test_refreshable_mat_view_replicated/test.py::test_long_query_cancel'
  )
  AND test_status IN ('FAIL', 'ERROR')
  AND check_start_time > now() - INTERVAL 30 DAY
GROUP BY day ORDER BY day DESC

day                   master_failures  pr_failures
2026-05-18 00:00:00   0                6   (← this PR)
2026-05-16 00:00:00   0                14  (PR #101757)
2026-05-05 00:00:00   0                2   (PR #104051)
2026-05-04 00:00:00   0                1   (PR #104051)
2026-05-01 00:00:00   0                1   (PR #103849)

The smoking gun: PR #101757 (your "KeeperDispatcher overhaul" — an entirely unrelated PR touching src/Coordination/ and contrib/NuRaft, with zero changes to refreshable MV code) hit the EXACT SAME 6 jobs with the EXACT SAME 2 tests on 2026-05-16:

Job	PR #105106 (this PR)	PR #101757 (Keeper overhaul)
`Integration tests (amd_tsan, 5/6)`	1 fail	2 fails
`Integration tests (amd_msan, 5/6)`	1 fail	2 fails
`Integration tests (amd_llvm_coverage, 5/5)`	1 fail	2 fails
`Integration tests (amd_llvm_coverage, 3/5)`	1 fail	2 fails
`Integration tests (arm_binary, distributed plan, 2/4)`	1 fail	2 fails
`Integration tests (arm_binary, distributed plan, 3/4)`	1 fail	2 fails

That identical failure footprint across two unrelated PRs, combined with zero master failures, makes this a chronic environmental flake — the tests' Replicated DDL coordination times out under slow sanitizer/coverage runners, regardless of what the PR changes.

Action requested

Could you re-run the failed CI jobs? I expect them to pass on a re-run. If they recur, I'm happy to dig into the underlying flaky timeout (the 180s distributed_ddl_task_timeout is just too tight on coverage builds for these MV-stress tests), but that would be a separate fix, unrelated to issue #104900.

Thanks!

— @groeneai (session cron:clickhouse-ci-task-worker:20260518-033000)

tuanpach · 2026-05-20T05:59:36Z

@groeneai

The CI was rerun. There are still some failures. Check them.

Pick up fix ClickHouse#105194 (02180_group_by_lowcardinality stabilization) and other master fixes to clear chronic flakes on PR ClickHouse#105106 CI.

groeneai · 2026-05-20T06:36:39Z

Hi @tuanpach, I rechecked the 9 failures on commit 30f1835b. None are caused by this PR. Just pushed a master merge to clear one of them; the other 8 are the same chronic Replicated DDL timeouts I reported earlier.

Failure breakdown

#	Failing test	Type	Status
1	`Tests/02180_group_by_lowcardinality` (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel)	Known flaky, already fixed on master by #105194	Cleared by master merge `aaf3da69` just pushed
2-9	`test_refreshable_mv_in_replicated_db` (x6) / `test_long_query_cancel` (x2)	Chronic ReplicatedDatabase DDL timeout flake	Unchanged from previous reply

(1) 02180_group_by_lowcardinality — already fixed, just needed rebase

PR Stabilize 02180_group_by_lowcardinality (non-deterministic LIMIT without ORDER BY) #105194 ("Stabilize 02180_group_by_lowcardinality") merged 4bced7eda73 at 2026-05-18 04:08:12 UTC.
The previous PR head commit 30f1835b was created at 2026-05-18 03:36:16 UTC, i.e. 32 minutes before the fix landed.
@alexey-milovidov asked me about the same failure on sibling PRs Stop clickhouse_fuzzer from self-inflicting timeouts via SIGALRM signals #105176, Web Terminal: accept cross-origin postMessage from clickhouse.com/cloud #105191, Fix for whitespace bug in LIKE formatting #101931 and the answer there was the same: rebase onto master.
I just merged current master into this branch (commit aaf3da69, pushed to fork). The .reference file is now {"count()":10} matching the post-fix format.
CIDB confirms: 0 master failures of 02180_group_by_lowcardinality in the 7 days since Stabilize 02180_group_by_lowcardinality (non-deterministic LIMIT without ORDER BY) #105194 merged.

(2) test_refreshable_mv_in_replicated_db and test_long_query_cancel — chronic, unrelated

Both tests fail with the same root cause as the previous reply: ReplicatedDatabase DDL task ... is not finished on 1 of 2 hosts ... Was waiting for 180 seconds or Timeout exceeded while receiving data from server. Waited for 300 seconds on DROP TABLE ... ON CLUSTER default. Neither stack passes through StorageMaterializedView::dropTempTable, the only path this PR touches.

CIDB cross-PR distribution (last 30 days, both tests):

test_refreshable_mv_in_replicated_db        test_long_query_cancel
PR 105106 (this PR): 11 hits across 6 checks  PR 105106: 4 hits across 2 checks
PR 101757 (al13n321 KeeperDispatcher): 10/5  PR 101757: 4 hits across 2 checks
PR 104051: 2/2                                PR 104051: 1/1
PR 103849: 1/1

The smoking gun: @al13n321's PR #101757 ("KeeperDispatcher overhaul", which only touches src/Coordination/ and contrib/NuRaft, zero refreshable MV code) hit the SAME 2 tests in the SAME pattern at the SAME rate. This confirms it is environmental flakiness in Replicated DDL coordination under slow sanitizer/coverage runners, not a refreshable MV bug.

What I just did

Pushed master merge aaf3da69 to this branch. New CI run will start automatically. Expected outcome:

02180 will pass (fix is now in branch)
The 8 Replicated DDL timeout slots may or may not flake again (environmental). If they fail again on the same 2 tests, that confirms the chronic-flake attribution and the PR is mergeable.

Could you re-trigger CI / wait for the new commit's run, and if those 2 tests still flake, mark the PR mergeable? Happy to file a separate issue for the chronic Replicated DDL timeout pattern if you think that would help (it would be a sibling to existing flaky-test tracking).

groeneai · 2026-06-04T22:00:36Z

CI ledger — covered SHA: `a9167c7289f6666b149e1eb0c1926b921f76d846`

CI fully finished (Mergeable Check completed; 0 pending checks).

Check	Test	Disposition
`Stateless tests (amd_tsan, parallel, 1/2)`	`00096_obfuscator_save_load`, `00175_obfuscator_schema_inference`	flaky chronic trunk — tasks 2026-06-02-ci-00096obfuscatorsaveload-chronic-f / 2026-06-02-ci-00175obfuscatorschemainference-ab
`Integration tests (amd_msan/amd_tsan/amd_asan_ubsan/amd_llvm_coverage/arm_binary distributed plan)` (multiple variants)	`test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db`	chronic PR-only flaky on Replicated DB integration tests; 0 master / 7 distinct PRs in 60d (incl. unrelated branches like `throw-on-unknown-config-option`, `shard-profile-events-per-cpu`). Tracked in task 2026-06-04-ci-testrefreshablemvinreplicateddb. NOT PR-caused.
`Integration tests (amd_llvm_coverage, 3/5)`, `Integration tests (arm_binary, distributed plan, 2/4)`	`test_refreshable_mat_view_replicated/test.py::test_long_query_cancel`	chronic PR-only flaky — same Replicated-DB-DDL-race family as the test above.
`Integration tests (amd_msan, 1/6)`	`test_refreshable_mat_view_replicated/test.py::test_circular_dependencies_survive_restart`	tracked in 2026-06-04-ci-p2-testcirculardependenciessurviv (chronic).
`Stateless tests (amd_tsan, s3 storage, parallel, 2/2)`	`02813_starting_in_text_log`	chronic flaky text_log timing test on s3 storage.

No PR-caused failures. The fix is small (StorageMaterializedView::dropTempTable bypasses max_table_size_to_drop for refresh-managed temp tables) and the new regression test 04247_104900_refreshable_mv_tmp_table_leak.sh exercises it cleanly. Awaiting human review (@ PedroTadim).

Session: cron:our-pr-ci-monitor:20260604-213000

…ble-mv-tmp-table-leak

groeneai · 2026-06-22T13:23:32Z

CI finished on this head (7d9308a4de9, master-merge). The 8 red checks are the same chronic environmental failures already classified on the prior commit (#105106 (comment)); none are caused by this PR (0 master hits in 30d):

test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db and test_refreshable_mat_view_replicated/test.py::test_long_query_cancel (6 Integration jobs): chronic Replicated-DB DDL-coordination flakiness. 0 master hits in 30d; recurs across unrelated PRs (a server-config-only PR hit it 18x in 60d). Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105106&sha=7d9308a4de9f45929efeb6ff2702ea582bc80ae3&name_0=PR&name_1=Integration%20tests%20%28amd_tsan%2C%204%2F6%29
Stress test (arm_asan_ubsan) Hung check: chronic, 25+ unrelated PRs + 2 master in 30d. Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105106&sha=7d9308a4de9f45929efeb6ff2702ea582bc80ae3&name_0=PR&name_1=Stress%20test%20%28arm_asan_ubsan%29

clickhouse-gh · 2026-06-23T14:11:41Z

Dear @tuanpach, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

tuanpach · 2026-06-23T14:19:17Z

@groeneai

Check the CI failure. Any PRs fixing them?

groeneai · 2026-06-23T14:50:04Z

@tuanpach Re-checked CI on the current HEAD (7d9308a). Three reds, all chronic, none caused by this PR. It only touches StorageMaterializedView::dropTempTable, which is internal to RefreshTask and is never reached by a user DROP TABLE.

test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db (5 integration jobs). Fails at drop table re.<name> with rc 159, Timeout exceeded ... 300 seconds: the Replicated-DB DDL task does not finish in time on the slow sanitizer/coverage runners. Last 30d: 0 master failures, 27 failures across 6 PRs (5 unrelated). Our merged Fix flaky test_refreshable_mv_in_replicated_db #107132 fixed a different failure mode of this test (the Refreshable view ... doesn't exist race after SYSTEM SYNC DATABASE REPLICA) and is already in this branch. No PR currently addresses this DROP-timeout mode, which is generic Replicated-DDL-coordination slowness rather than a test bug.
test_refreshable_mat_view_replicated/test.py::test_long_query_cancel (2 integration jobs). Same signature: DROP TABLE ... ON CLUSTER default times out at 300s. Last 30d: 1 master failure, 8 failures across 5 PRs. No fix PR exists.
Stress test (arm_asan_ubsan), Hung check failed, possible deadlock found. Trunk-wide chronic hung-check, not specific to this PR: last 30d = 290 master failures across 350 PRs. No single fix PR; it is a class of trunk hangs under stress.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105106&sha=7d9308a4de9f45929efeb6ff2702ea582bc80ae3&name_0=PR&name_1=Integration%20tests%20%28amd_llvm_coverage%2C%202%2F5%29

The PR is otherwise green and you approved it; these reds are known-flaky CI gating only.

…CREATE OR REPLACE prepareRefresh created the .tmp.inner_id.<uuid> table with CREATE OR REPLACE. When a leftover .tmp.inner_id.<uuid> from a previous failed refresh already existed, CREATE OR REPLACE renamed it to a _tmp_replace_<random> name and then dropped it through the normal path that honors max_table_size_to_drop. A leftover larger than that limit failed to drop and was leaked under the _tmp_replace name, and every later refresh repeated the leak until the disk filled (issue ClickHouse#104900). Drop the leftover explicitly with DROP TABLE IF EXISTS on a context that bypasses max_table_size_to_drop / max_partition_size_to_drop (same as dropTempTable), then do a plain CREATE TABLE. The async drop frees the table name synchronously on Atomic and Replicated databases, so the following CREATE does not collide. Extends the 04247 regression test with a scenario that plants a large leftover .tmp.inner_id.<uuid> before a refresh and asserts no .tmp.* or _tmp_replace_* table remains afterwards. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-29T22:49:17Z

@al13n321 Done, both in 4cecc62.

L709 comment: kept your wording (you already committed it in 518d023).
RMV change: prepareRefresh now does DROP TABLE IF EXISTS .tmp.inner_id.<uuid> (on a context that pins max_table_size_to_drop/max_partition_size_to_drop to 0, same as dropTempTable) then a plain CREATE TABLE, instead of CREATE OR REPLACE TABLE. The async drop frees the name synchronously on Atomic/Replicated, so the following CREATE does not collide.

Verified end-to-end on a debug build by planting a large leftover .tmp.inner_id.<view_uuid> and refreshing with max_table_size_to_drop = 1:

before (CREATE OR REPLACE): refresh fails Code 359 and leaks _tmp_replace_<random> (prepareRefresh -> doCreateOrReplaceTable -> InterpreterDropQuery -> checkTableCanBeDropped).
after (this change): refresh succeeds, no .tmp.* or _tmp_replace_* left behind.

Extended 04247_104900_refreshable_mv_tmp_table_leak with that scenario (passes 10/10 with randomization).

Session id: cron:clickhouse-worker-slot-4:20260629-221300

groeneai · 2026-06-29T22:49:43Z

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes. Plant a large `.tmp.inner_id.<view_uuid>` (1000 rows), set `max_table_size_to_drop = 1`, `SYSTEM REFRESH VIEW`. On the old code the refresh fails every time with Code 359 and leaks `_tmp_replace_<random>`.
b	Root cause explained?	`prepareRefresh` used `CREATE OR REPLACE TABLE`. When the target `.tmp.inner_id.<uuid>` already exists, `doCreateOrReplaceTable` renames it to `_tmp_replace_<random>` then drops it via a context that honors `max_table_size_to_drop`. A leftover larger than the limit fails `checkTableCanBeDropped`, the rename is not rolled back, and `_tmp_replace_<random>` is leaked. Every later refresh repeats it.
c	Fix matches root cause?	Yes. Drop the leftover explicitly with `DROP TABLE IF EXISTS` on a size-bypassed context (same as `dropTempTable`), then plain `CREATE TABLE`. This removes the `CREATE OR REPLACE` rename-then-size-checked-drop path entirely, so nothing is left under `_tmp_replace_*`.
d	Test intent preserved / new tests added?	Existing scenario (rotated-out drop) preserved. Added scenario B to `04247_104900_refreshable_mv_tmp_table_leak`: plants a large leftover and asserts 0 `.tmp.` and 0 `_tmp_replace_` after a refresh.
e	Both directions demonstrated?	Yes, end-to-end on a debug build. Old binary: refresh fails Code 359, leaks `_tmp_replace_*` (stack `prepareRefresh` -> `doCreateOrReplaceTable` -> `InterpreterDropQuery` -> `checkTableCanBeDropped`). New binary: refresh succeeds, 0 leaks. Test passes 10/10 with randomization.
f	Fix is general across code paths?	Yes. `prepareRefresh` is the only RMV site using `CREATE OR REPLACE` for the temp inner table; the post-exchange drop in `dropTempTable` already bypasses the size limit. Both refresh-owned temp-table paths now bypass it consistently.
g	Fix generalizes across inputs?	Yes. The leftover name is deterministic (`.tmp.inner_id.<view_uuid>`) regardless of MV column types/engine; the explicit DROP uses `if_exists=true` so it is a no-op when no leftover exists (normal case), and size-bypassed when one does. Verified the normal multi-refresh path (no leftover) stays clean.
h	Backward compatible?	Yes. Internal refresh behavior only; no setting default, on-disk/wire format, or user-visible semantics change. The temp inner table name and the resulting inner table are unchanged.
i	Invariants and contracts preserved?	Yes. The async `DROP` (`sync=false`) frees the table name synchronously on Atomic/Replicated (`DatabaseAtomic::dropTableImpl` detaches the name + erases the path map entry under lock; `sync` only gates background data cleanup keyed by UUID), so the subsequent `CREATE` cannot hit `TABLE_ALREADY_EXISTS`. Each `CREATE` uses a fresh UUID, so no `store/<uuid>/` collision. Keeps the existing "don't wait for other replicas" semantics.

Session id: cron:clickhouse-worker-slot-4:20260629-221300

…e, just bypass drop size limits The previous commit replaced the atomic CREATE OR REPLACE TABLE of the temp inner table .tmp.inner_id.<view_uuid> with a non-atomic DROP TABLE IF EXISTS (sync=false) followed by a plain CREATE TABLE. On a Replicated database that splits one atomic replicated-DDL transaction into two separate, un-waited DDL-log entries on a fixed table name, which opened a race where a replica observes an empty inner table after a refresh. That surfaced as test_refreshable_mv_in_replicated_db failing at the "select * from re.a" assertion (empty result), a signature that appeared for the first time only on that commit. Restore the atomic CREATE OR REPLACE TABLE and instead fix the original ClickHouse#104900 leak at its real cause: doCreateOrReplaceTable renames an existing leftover to _tmp_replace_<random> and drops it through a context copied from the create context, and that drop honors max_table_size_to_drop. The leak happened only because the refresh context did not bypass the size limit. Run the CREATE OR REPLACE on a context with max_table_size_to_drop=0 and max_partition_size_to_drop=0 so the internal drop of the leftover is size-bypassed, fixing the leak while keeping the create/exchange/drop in a single atomic replicated-DDL transaction. The 04247 regression test still asserts no .tmp.* and no _tmp_replace_* tables remain after a refresh that finds a large leftover; both scenarios pass. Closes: ClickHouse#104900 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-30T02:54:22Z

@al13n321 The DROP TABLE IF EXISTS + CREATE TABLE change (commit 4cecc62) caused a refresh-correctness regression in test_refreshable_mv_in_replicated_db: on a Replicated database it failed at assert select * from re.a == "0\n10\n" with an empty inner table. That assertion has never failed in 90 days of CI for this PR except on that one commit.

Cause: on a Replicated DB, CREATE OR REPLACE TABLE is a single atomic replicated-DDL transaction (the create of the temp table, the EXCHANGE rename, and the drop of the old one are committed in one ZK multi-op via the isCreateOrReplaceQuery path). Splitting it into a separate DROP TABLE IF EXISTS (sync=false) and a plain CREATE TABLE makes two independent, un-waited DDL-log entries on the fixed name .tmp.inner_id.<view_uuid>, which opens a window where a replica observes/exchanges an empty inner table.

So I kept CREATE OR REPLACE TABLE but fixed the _tmp_replace_* leak at its real cause instead: doCreateOrReplaceTable drops the renamed leftover through a context copied from the create context, and that drop honors max_table_size_to_drop. The leak happened only because the refresh context did not bypass the size limit. New commit (99b36f3) runs the CREATE OR REPLACE on a context with max_table_size_to_drop = 0 and max_partition_size_to_drop = 0, so the internal drop of the leftover is size-bypassed. This fixes the _tmp_replace_* leak while keeping the operation atomic.

Verified both directions locally on a debug build:

Without the size bypass, CREATE OR REPLACE on a 1000-row leftover with max_table_size_to_drop=1 throws code 359 and leaks _tmp_replace_<random>.
With the fix, the 04247 regression test (plant a large leftover, refresh, assert 0 .tmp.* and 0 _tmp_replace_*) passes 5/5.
test_refreshable_mv_in_replicated_db no longer hits the empty-re.a assertion locally; only the pre-existing Replicated-DDL drop timeout flakes, which is unrelated to this change.

groeneai · 2026-06-30T02:54:45Z

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes for the leak direction: `CREATE OR REPLACE TABLE t ... AS SELECT ... numbers(1000)` over a 1000-row table with `max_table_size_to_drop=1` throws code 359 and leaks `_tmp_replace_<random>` every time. The empty-`re.a` regression is a Replicated-DDL race, not deterministic; addressed by mechanism (restoring the atomic single-transaction path that was empirically clean across this PR's full CI history).
b	Root cause explained?	The prior commit split the atomic `CREATE OR REPLACE TABLE .tmp.inner_id.<uuid>` into `DROP TABLE IF EXISTS` (sync=false) + plain `CREATE TABLE`. On a Replicated DB that is two separate un-waited DDL-log entries on a fixed name instead of one atomic ZK multi-op (`isCreateOrReplaceQuery` path), so a replica can observe/exchange an empty inner table, failing `select * from re.a`. The original `_tmp_replace_*` leak (#104900) is because `doCreateOrReplaceTable`'s internal drop of the renamed leftover uses a context copied from the create context and honors `max_table_size_to_drop`, which the refresh context did not bypass.
c	Fix matches root cause?	Yes. Restore atomic `CREATE OR REPLACE` (fixes the empty-table race) and set `max_table_size_to_drop=0` + `max_partition_size_to_drop=0` on the create context so the internal leftover-drop is size-bypassed (fixes the leak). Both causes addressed directly, no band-aid.
d	Test intent preserved / new tests added?	Preserved. 04247 still plants a large leftover `.tmp.inner_id.<view_uuid>` and asserts 0 `.tmp.` and 0 `_tmp_replace_` after a refresh, plus the rotated-target Scenario A. Only the Scenario B comment was updated to describe the new mechanism; assertions unchanged.
e	Both directions demonstrated?	Yes. Without the size bypass, plain CREATE OR REPLACE leaks `_tmp_replace_*` (code 359). With the fix, 04247 passes 5/5; `test_refreshable_mv_in_replicated_db` no longer hits the empty-`re.a` assertion.
f	Fix is general across code paths?	Yes. `prepareRefresh` is the only refresh table-create site. The size bypass mirrors the existing `dropTempTable` in the same file, so the create and drop sides are now consistent. No sibling path leaks.
g	Fix generalizes across inputs?	The setting bypass is type-agnostic (it gates the drop-size check, independent of column types). Verified with MergeTree inner tables; the inner-table engine comes from the view definition and the path does not branch on datatype/wrapper.
h	Backward compatible?	Yes. No setting default, on-disk/wire format, or replication metadata change. The size-bypass applies only to the RMV-internal temp-table create context, never to user-issued DROP/CREATE.
i	Invariants and contracts preserved?	Yes. Restoring `CREATE OR REPLACE` restores the single-atomic-transaction invariant the Replicated-DDL log relies on (which the split had broken). The fresh-UUID temp table and EXCHANGE-by-name semantics are unchanged. `Context::createCopy` preserves the ZK metadata transaction, same pattern as the existing `dropTempTable`.

Session id: cron:clickhouse-worker-slot-1:20260630-014500

…e limits The previous commit ran the temp inner table CREATE OR REPLACE and the temp table DROP on a Context::createCopy(refresh_context) so it could set max_table_size_to_drop=0 / max_partition_size_to_drop=0. ContextData's copy constructor is hand-written and does not copy parent_table_uuid, ddl_query_cancellation, or ddl_additional_checks_on_enqueue, which RefreshTask sets on refresh_context before prepareRefresh. On a Replicated database the copied context therefore enqueued the refresh DDL without the parent MV UUID, the Keeper root-version check, and the DDL cancellation token, so a lagged replica could replay stale .tmp.inner_id.<uuid> entries instead of skipping them. Set the two drop-size settings on refresh_context itself instead of a copy. The settings only gate DROP size checks, refresh_context is created fresh per refresh, and dropTempTable wants the same bypass anyway, so mutating it in place is safe and matches exchangeTargetTable which already runs its interpreter on refresh_context directly. doCreateOrReplaceTable's internal drop of the renamed _tmp_replace_* leftover inherits the bypass via its create context, so the issue ClickHouse#104900 leak fix still holds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-30T04:23:13Z

Confirmed and fixed in 76673cc.

Context::createCopy runs new Context(*other), and ContextData's copy constructor (Context.cpp) is hand-written: it lists each copied member explicitly and copies metadata_transaction but omits parent_table_uuid, ddl_query_cancellation, and ddl_additional_checks_on_enqueue. So the copied context lost the refresh DDL metadata that RefreshTask sets on refresh_context (RefreshTask.cpp), and on a Replicated database the CREATE OR REPLACE / DROP enqueued without the parent UUID, the Keeper root-version check, and the cancellation token. Same problem in both prepareRefresh and dropTempTable.

Took option (1): set max_table_size_to_drop / max_partition_size_to_drop on refresh_context itself instead of a copy. The two settings only gate the DROP size check, refresh_context is created fresh per refresh, and dropTempTable wants the same bypass anyway, so mutating it in place is safe. This also matches exchangeTargetTable, which already runs its interpreter on refresh_context directly. doCreateOrReplaceTable's internal drop of the renamed _tmp_replace_* leftover inherits the bypass via its create context, so the #104900 leak fix still holds.

Verified: 04247_104900_refreshable_mv_tmp_table_leak 11/11, and test_refreshable_mv/test.py::test_refreshable_mv_in_replicated_db 3/3 (the empty-re.a line-97 path the previous commit regressed no longer reproduces).

groeneai · 2026-06-30T04:23:32Z

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes for the defect: `ContextData`'s hand-written copy ctor provably omits the three DDL fields, so `createCopy` drops them 100% of the time. The runtime symptom (empty `re.a`) is a Replicated-DDL race surfaced on the prior commit (`4cecc62`); the metadata-loss root defect is deterministic.
b	Root cause explained?	`createCopy` -> `new Context(*other)` -> hand-written `ContextData(const ContextData&)` copies an explicit member list that includes `metadata_transaction` but excludes `parent_table_uuid` / `ddl_query_cancellation` / `ddl_additional_checks_on_enqueue`. `RefreshTask` sets those on `refresh_context`; the copy default-inits them; `DatabaseReplicatedWorker` / `DDLTask` then enqueue the refresh DDL without the parent-UUID skip, Keeper version check, and cancellation token.
c	Fix matches root cause?	Yes. Set the two drop-size settings on `refresh_context` itself (no copy) in both `prepareRefresh` and `dropTempTable`, so the DDL metadata is carried into the create/drop interpreters.
d	Test intent preserved / new tests added?	Yes. `04247` still asserts 0 `.tmp.` and 0 `_tmp_replace_` after refreshes (Scenario A+B); `test_refreshable_mv_in_replicated_db` still asserts the replicated basic-refresh result. No assertion weakened.
e	Both directions demonstrated?	The prior commit (`4cecc62`, createCopy) regressed `test_refreshable_mv_in_replicated_db` line-97 (empty `re.a`, CIDB: first-ever hit, only on `4cecc62`); with this fix that test passes 3/3 and `04247` passes 11/11.
f	Fix is general across code paths?	Both lossy `createCopy` sites in the refresh path (`prepareRefresh` create, `dropTempTable` drop) are fixed. `exchangeTargetTable` already used `refresh_context` directly. No other refresh DDL goes through a copied context.
g	Fix generalizes across inputs?	The metadata-loss is input-independent (any refresh on a Replicated DB). Covers append and non-append refresh, leftover-present and leftover-absent, and the `root_znode_version == -1` (no enqueue check) branch.
h	Backward compatible?	Yes. No setting default, on-disk/wire/replication format, or new validation changed. Restores the original pre-PR behavior (interpreter on `refresh_context`) plus an in-context size bypass.
i	Invariants and contracts preserved?	Yes. `refresh_context` is created fresh per refresh, so the in-place setting change has per-refresh scope; INSERT and RENAME do not read the drop-size settings, so the create/exchange are unaffected. The Replicated-DDL contract (parent-UUID skip / Keeper version check / cancellation) is now upheld, which the copied context broke.

Session id: cron:clickhouse-worker-slot-2:20260630-035700

clickhouse-gh · 2026-06-30T07:42:09Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.40%	85.40%	+0.00%
Functions	92.70%	92.70%	+0.00%
Branches	77.70%	77.70%	+0.00%

Changed lines: Changed C/C++ lines covered: 9/10 (90.00%) · Uncovered code

Full report · Diff report

groeneai · 2026-06-30T09:24:38Z

CI finish ledger — `76673cc`

Every failure below has an owner: a fixing PR (ours or external), or a full-effort fix task whose fixing-PR link will be posted here when it opens. Only CH Inc sync is exempt.

The bot DDL-metadata fix (commit 76673cc) is clean: test_refreshable_mv_in_replicated_db and the sibling refreshable-MV integration tests passed (0 FAIL / 240180 OK across 144 jobs on this head), confirming the line-97 empty-re.a regression and the createCopy-drops-DDL-metadata issue are both resolved.

Session id: cron:our-pr-ci-monitor:20260630-090000

alexey-milovidov added the can be tested Allows running workflows for external contributors label May 17, 2026

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 17, 2026

tuanpach approved these changes May 18, 2026

View reviewed changes

Merge branch 'master' into fix-104900-refreshable-mv-tmp-table-leak

30f1835

clickhouse-gh Bot assigned tuanpach May 18, 2026

Merge branch 'master' into fix-104900-refreshable-mv-tmp-table-leak

aaf3da6

Pick up fix ClickHouse#105194 (02180_group_by_lowcardinality stabilization) and other master fixes to clear chronic flakes on PR ClickHouse#105106 CI.

janeklb mentioned this pull request May 25, 2026

Refreshable MV target tables keep stale inactive parts between refreshes and repeatedly abort background merges #105807

Open

Merge commit '3fa1ed2d9d9cc4f472cd43e2e4eaabac12ec36db' into HEAD

a9167c7

Merge remote-tracking branch 'origin/master' into fix-104900-refresha…

7d9308a

…ble-mv-tmp-table-leak

clickhouse-gh Bot unassigned tuanpach Jun 23, 2026

tuanpach self-assigned this Jun 23, 2026

al13n321 reviewed Jun 29, 2026

View reviewed changes

Comment thread src/Storages/StorageMaterializedView.cpp Outdated

Update src/Storages/StorageMaterializedView.cpp

518d023

al13n321 approved these changes Jun 29, 2026

View reviewed changes

clickhouse-gh Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/Storages/StorageMaterializedView.cpp Outdated

al13n321 approved these changes Jun 30, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

groeneai commented May 16, 2026

Motivation

Root cause

Approach

Regression test

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

groeneai commented May 16, 2026

Pre-PR validation gate (per task pipeline)

Uh oh!

groeneai commented May 16, 2026

Uh oh!

clickhouse-gh Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Final Verdict

Uh oh!

tuanpach commented May 18, 2026

Uh oh!

groeneai commented May 18, 2026

Summary of failed jobs

Failure signature

Why this PR cannot be the cause

CIDB confirms the tests are chronically flaky on these exact jobs

Action requested

Uh oh!

tuanpach commented May 20, 2026

Uh oh!

groeneai commented May 20, 2026

Failure breakdown

(1) 02180_group_by_lowcardinality — already fixed, just needed rebase

(2) test_refreshable_mv_in_replicated_db and test_long_query_cancel — chronic, unrelated

What I just did

Uh oh!

groeneai commented Jun 4, 2026

CI ledger — covered SHA: a9167c7289f6666b149e1eb0c1926b921f76d846

Uh oh!

groeneai commented Jun 22, 2026

Uh oh!

clickhouse-gh Bot commented Jun 23, 2026

Uh oh!

tuanpach commented Jun 23, 2026

Uh oh!

groeneai commented Jun 23, 2026

Uh oh!

Uh oh!

groeneai commented Jun 29, 2026

Uh oh!

groeneai commented Jun 29, 2026

Uh oh!

groeneai commented Jun 30, 2026

Uh oh!

groeneai commented Jun 30, 2026

Uh oh!

Uh oh!

groeneai commented Jun 30, 2026

Uh oh!

groeneai commented Jun 30, 2026

Uh oh!

clickhouse-gh Bot commented Jun 30, 2026

LLVM Coverage Report

Uh oh!

groeneai commented Jun 30, 2026

CI finish ledger — 76673cc

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clickhouse-gh Bot commented May 17, 2026 •

edited

Loading

CI ledger — covered SHA: `a9167c7289f6666b149e1eb0c1926b921f76d846`

CI finish ledger — `76673cc`