Supports automatically fill on cluster value#94138
Conversation
|
This needs more thought.
|
Thanks a lot, I will further optimize these two points later. Then I will add a test case |
aefd434 to
0d9467d
Compare
|
Workflow [PR], commit [6548354] Summary: ❌
AI ReviewSummaryThis PR adds an experimental setting pair that rewrites eligible initial queries with a missing Findings
Tests
Final VerdictStatus: Minimum required actions: fix the new |
232bf45 to
89c6244
Compare
c9cfebb to
1746eae
Compare
1746eae to
d77ffbd
Compare
Maybe we can use |
optimize code optimize name rename test file fix check style bug fix test case fix test case # Conflicts: # src/Interpreters/executeQuery.cpp
d77ffbd to
c1629fd
Compare
This settings has now been marked as experimental, and a |
…ewrite-on-cluster
…cases Address the remaining review threads about the auto-fill blocklist. Each case either turned a valid local query into an exception, or skipped a statement that ON CLUSTER actually supports: - UDF / workload / resource DDL: these reject `ON CLUSTER` only when their storage `isReplicated()`. With the default, non-replicated storage `ON CLUSTER` is valid and distributes the statement, so auto-fill must fill it. Gate the skip on `getUserDefinedSQLObjectsStorage().isReplicated()` / `getWorkloadEntityStoragePtr()->isReplicated()` instead of skipping these families unconditionally. - Unsupported `ALTER` commands: `executeDDLQueryOnCluster` rejects `ATTACH PARTITION` / `FETCH PARTITION` with `Unsupported type of ALTER query`. Filling the cluster for such an `ALTER` would make a valid local statement throw, so skip auto-fill when any command is not supported `ON CLUSTER` (reusing `isSupportedAlterTypeForOnClusterDDLQuery`). - Database-level `ALTER` / `RENAME`: `ALTER DATABASE ... MODIFY COMMENT/SETTING` and `RENAME DATABASE` are not coordinated by the `Replicated` database engine (`InterpreterAlterQuery::executeToDatabase` / `InterpreterRenameQuery::executeToDatabase` apply them on the initiator only). Like `DROP DATABASE`, they need `ON CLUSTER` to reach every node, so a `Replicated` target database must no longer be a reason to skip auto-fill for them. Table-level statements still stay local. Extend `03789_automatic_fill_on_cluster_mode` (functions now rewritten ON CLUSTER; `ATTACH PARTITION` stays local) and `03789_automatic_fill_on_cluster_replicated_database` (`ALTER DATABASE` / `RENAME DATABASE` are rewritten ON CLUSTER). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| bool is_database_level_alter | ||
| = alter_query && alter_query->alter_object == ASTAlterQuery::AlterObjectType::DATABASE; | ||
| if (!is_database_level_alter) | ||
| target_is_replicated_database = is_replicated_database(query_with_table->getDatabase()); |
There was a problem hiding this comment.
This Replicated-database skip is too broad for table-level ALTER commands that the database engine explicitly does not replicate.
Concrete trace: ALTER TABLE repl_db.t DROP PARTITION ... reaches this branch, is_replicated_database returns true, and auto-fill leaves cluster empty. But DatabaseReplicated::shouldReplicateQuery returns false for DROP PARTITION/DROP DETACHED PARTITION, FREEZE/UNFREEZE, and UNLOCK SNAPSHOT, while executeDDLQueryOnCluster only rejects ATTACH PARTITION and FETCH PARTITION. So a valid ON CLUSTER operation is silently kept local instead of being routed to the configured cluster.
Please base this skip on the specific operation being database-replicated, not just on the database engine name. The existing ATTACH/FETCH guard can still keep unsupported ON CLUSTER alters local.
There was a problem hiding this comment.
Not changed yet — this one looks like a genuine semantic call rather than a mechanical fix, so flagging it for @alexey-milovidov.
DatabaseReplicated::shouldReplicateQuery returns false for DROP PARTITION / DROP DETACHED PARTITION / FREEZE/UNFREEZE / UNLOCK SNAPSHOT precisely because, for a ReplicatedMergeTree table, those operations are replicated by the table's own machinery (its replication log), not by the database engine. So the current "keep local" behavior is not a silent loss: the partition drop still reaches the other replicas of the table. Routing such an ALTER ON CLUSTER would push it to every node of the configured cluster in addition to the table's own replication, which for a Replicated database whose replica set does not match the cluster topology can double-apply or target nodes that do not host the table.
There is also a practical obstacle to basing the skip on shouldReplicateQuery from executeQuery (before interpretation): it resolves the storage via getTable (throws for IF EXISTS / not-yet-existing targets) and depends on runtime state (is_replicated_database_internal, has_many_shards, KeeperMap checks) that is not necessarily settled at the auto-fill point.
So whether to route these non-database-replicated table ALTERs ON CLUSTER is a behavioral decision (and the conservative current behavior may be the correct one for ReplicatedMergeTree). Leaving the thread open for your call; the other two blockers from this review round are addressed in the two commits just pushed.
The previous code skipped auto-fill for the whole RENAME on the first `Replicated`-database element, silently leaving ordinary elements local. For a mixed statement such as `RENAME TABLE ordinary_db.o TO ordinary_db.o2, repl_db.r TO repl_db.r2` this leaves `cluster` empty: `InterpreterRenameQuery` then applies the ordinary rename on the initiator and throws on the unsupported multi-table `Replicated` rename, leaving the ordinary table renamed locally. Mirror the multi-table `DROP` guard: classify every rename element and reject a genuinely mixed statement (a `Replicated`-database element alongside an ordinary one) with `BAD_ARGUMENTS`, since a single `cluster` field cannot express both intents. Pure-ordinary renames are still auto-filled and pure-replicated renames are still skipped. The rejection happens before any rename is applied, so no ordinary target is changed locally. Added `Test 9` to `03789_automatic_fill_on_cluster_replicated_database` covering this. Addresses the AI review thread on `src/Interpreters/executeQuery.cpp`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed collections Access-control queries (`CREATE USER`/`ROLE`/`QUOTA`/`ROW POLICY`/`SETTINGS PROFILE`, `DROP` of access entities, `GRANT`) and named-collection queries (`CREATE`/`ALTER`/`DROP NAMED COLLECTION`) accept `ON CLUSTER` only when their storage is not replicated, exactly like the user-defined functions, workloads and resources already handled here. With replicated access storage a local `CREATE USER u` succeeds and replicates through `ReplicatedAccessStorage`, but auto-fill turned it into `CREATE USER u ON CLUSTER ...`, which the interpreter rejects unless `ignore_on_cluster_for_replicated_access_entities_queries` is enabled (and which would just be stripped back when it is). Replicated named collections have the same shape with `ignore_on_cluster_for_replicated_named_collections_queries`. Decide per storage using the same predicates as `removeOnClusterClauseIfNeeded`: skip auto-fill when the access storage contains a `ReplicatedAccessStorage`, or when the named-collection factory uses replicated storage. Non-replicated storage keeps `ON CLUSTER` valid and eligible for auto-fill. Addresses the AI review thread on `src/Interpreters/executeQuery.cpp`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked this up to address the latest AI review round (the three new threads on Addressed two of the three blockers (resolved their threads):
Left the third blocker open for your call ( Test gap (the AI’s third CI. The pre-existing reds on Build note. I could not run a full build in this environment (disk full), so the two C++ changes were verified with |
The automatic `ON CLUSTER` fill decision in `executeQueryImpl` read `allow_experimental_automatic_fill_on_cluster_mode` and `cluster_for_automatic_fill_mode` before `InterpreterSetQuery::applySettingsFromQuery` had extracted query-level settings from the AST. As a result the feature could not be switched per query: with the profile setting off, a statement carrying `SETTINGS allow_experimental_automatic_fill_on_cluster_mode = 1` stayed local, and with the profile setting on, `SETTINGS allow_experimental_automatic_fill_on_cluster_mode = 0` still injected `ON CLUSTER`. Apply the query's `SETTINGS` clause before the auto-fill decision (and before the corresponding interpreter), so query-level settings take effect. The `settings` reference points at the live context settings, which `applySettingsFromQuery` mutates in place, so the decision below now observes the per-query values. Add `03789_automatic_fill_on_cluster_query_settings` covering both the per-query enable and per-query disable cases. Addresses the AI review (Request changes) on ClickHouse#94138 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ewrite-on-cluster
|
Picked this up to address the AI review (Request changes) about query-level Setting-ordering fix ( New test Verification against a locally built server (embedded Keeper +
Also merged latest |
…ewrite-on-cluster
After merging `master`, a new `26.7` version block was opened. The settings `allow_experimental_automatic_fill_on_cluster_mode` and `cluster_for_automatic_fill_mode` were registered in the `26.6` block, but `02995_new_settings_history` requires settings absent from the `26.6.1` baseline to appear in a version block greater than `26.6`. Move both entries to the `26.7` block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged the current One follow-up was needed: Verified The two reds on the previous run were both confirmed master-wide flakes unrelated to this PR (off-by-default experimental feature): |
| || out_ast->as<ASTCreateSettingsProfileQuery>() || out_ast->as<ASTDropAccessEntityQuery>() | ||
| || out_ast->as<ASTGrantQuery>()) | ||
| target_forbids_on_cluster | ||
| = context->getAccessControl().containsStorage(ReplicatedAccessStorage::STORAGE_TYPE); |
There was a problem hiding this comment.
This predicate is too coarse for access-control queries that explicitly target a non-replicated storage. In a server that has both replicated access storage and a local storage, CREATE USER u IN local_directory (or DROP USER u FROM local_directory) enters this branch, sees that some replicated access storage exists, and leaves cluster empty. The query then runs only on the initiator even though explicit CREATE USER u IN local_directory ON CLUSTER <cluster> is valid and would create the user in the local storage on every host.
That breaks the setting's contract for an eligible statement and can leave access configuration divergent across the configured cluster. Please base the skip on the target storage when storage_name is specified (for example, getStorageByName(...)->isReplicated()), and keep the broad conservative check only for statements whose target storage is implicit.
There was a problem hiding this comment.
This predicate is intentionally identical to the canonical removeOnClusterClauseIfNeeded (isAccessControlQuery + getAccessControl().containsStorage(ReplicatedAccessStorage::STORAGE_TYPE)), which is the single source of truth for which ON CLUSTER access queries are considered managed by replicated storage. Making auto-fill finer-grained than that — resolving an explicit storage_name and checking only that storages isReplicated— would diverge from the canonical function: for consistencyremoveOnClusterClauseIfNeededwould have to gain the same per-storage resolution, otherwise an explicitCREATE USER u IN local_directory ON CLUSTER ...would still be stripped back to local (whenignore_on_cluster_for_replicated_access_entities_queries` is enabled) even though auto-fill had added it.
Note the failure mode here is conservative: the query stays local and the user can still add ON CLUSTER explicitly — unlike the temporary-table finding, which actively mis-routed a valid local statement through distributed DDL. It also cannot be exercised by a plain stateless .sql: it requires a server configured with both a replicated and a local access storage.
So whether to make the access-storage predicate target-aware (and align removeOnClusterClauseIfNeeded with it for consistency) is a behavior/design decision rather than a mechanical parity fix. Flagging for @alexey-milovidov / @JingYanchao to decide; leaving this thread open.
…ary table `ALTER TABLE t` and `OPTIMIZE TABLE t` with an omitted database resolve to a session-local temporary table even without the `TEMPORARY` keyword: the interpreters resolve the unqualified name via `Context::ResolveAll`, and the parser does not set `isTemporary` for them. The automatic `ON CLUSTER` fill in `executeQueryImpl` only checked the parser flag in the `ASTQueryWithTableAndOutput` branch, so with the feature enabled such a statement got a cluster filled and was routed through distributed DDL, leaving the session table untouched. Reuse the existing `resolves_to_temporary` helper (hoisted out of the `DROP` branch) for the `ALTER`/`OPTIMIZE` branch too, guarding on a present table so a database-level statement does not construct an empty `StorageID`. Extend `03789_automatic_fill_on_cluster_mode` with a regression covering `ALTER TABLE` and `OPTIMIZE TABLE` on a session temporary table. Addresses the AI review finding on `executeQuery.cpp` (temporary-table guard). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
After the master-merge earlier today, the AI review re-ran on the merged head and flipped to Request changes with two findings on Fixed: temporary-table guard for Left open for your call: the access-storage predicate ( Verification. The shared build directory here is in a divergent transitional state (a 4k-file rebuild), so I did not do a full local build; the change was verified with CI on the prior head ( No approval added — not my PR. |
…ewrite-on-cluster
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 115/127 (90.55%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |
…porary The temporary-table resolution heuristic for the automatic `ON CLUSTER` fill was applied to every `ASTQueryWithTableAndOutput`, including `ASTCreateQuery`. A persistent `CREATE TABLE t` / `CREATE VIEW t` creates a new object in the current database; it does not target a session temporary table just because one named `t` already shadows it (a temporary and a persistent table of the same name can coexist). `InterpreterCreateQuery` treats a non-`TEMPORARY` create as persistent and fills the current database, so it is an eligible statement that should be distributed `ON CLUSTER`. With the heuristic applied, a shadowing temporary table made auto-fill leave `cluster` empty and the persistent object was created only on the initiator. Exclude `ASTCreateQuery` from the resolution heuristic. A `CREATE TEMPORARY TABLE` is still kept local because `isTemporary` is already checked above. The remaining statements that reach this branch (`ALTER`, `OPTIMIZE`, `UPDATE`, `DELETE`, `CREATE`/`DROP INDEX`) operate on an existing table by name, so the heuristic remains correct for them. Add Test 11 to `03789_automatic_fill_on_cluster_mode`: a persistent `CREATE TABLE` and `CREATE VIEW` shadowed by same-named temporary tables are rewritten `ON CLUSTER` and the persistent objects are created. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review Request changes finding on Fixed: persistent Added Verification. The remaining open thread (access-storage predicate at |
There was a problem hiding this comment.
This access-entity classification does not cover ASTMoveAccessEntityQuery, so MOVE USER / MOVE ROLE / similar access-entity moves remain eligible for auto-fill even when replicated access storage is configured.
On a server with both replicated access storage and a local storage, MOVE USER u TO local_directory enters this block with an empty cluster, gets ON CLUSTER injected, and then InterpreterMoveAccessEntityQuery takes the distributed DDL branch. Each worker runs moveAccessEntities, which removes the entity from the source storage before inserting it into the destination; when the source is replicated storage, one worker can delete the shared ZooKeeper entity before the others resolve it, or only one host can end up with the local copy. That turns an ordinary local access-maintenance statement into a distributed operation that can fail or leave access storage divergent.
Please classify ASTMoveAccessEntityQuery before assigning cluster: either skip it whenever replicated access storage is present, or resolve the source and destination storages and auto-fill only when the move is safe to run independently on every host.
The last verification of Test 11 is written as
SELECT '...both persistent objects exist in the current database', count() = 2 FROM system.tables ...
so its second column is the boolean `count() = 2`, which is `1` when the two
persistent objects exist. The reference, however, recorded `2` (as if the query
selected the raw `count()`), so the test failed with
-Test 11 verification: both persistent objects exist in the current database 2
+Test 11 verification: both persistent objects exist in the current database 1
The feature is correct: with `allow_experimental_automatic_fill_on_cluster_mode`
both the persistent `CREATE TABLE` shadowed by a temporary table and the
persistent `CREATE VIEW` are rewritten `ON CLUSTER` and both persistent objects
are created (verified against a local server with `test_shard_localhost` and
embedded Keeper: raw `count()` is `2`, so `count() = 2` yields `1`). Only the
reference was wrong.
Fix the reference to `1`, matching the actual output and the test's own
convention for combined label + boolean rows (Test 10, e.g. `count() = 2` -> `1`).
CI report: ClickHouse#94138
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add two new settings allow_experimental_automatic_fill_on_cluster_mode & cluster_for_automatic_fill_mode to support filling ON CLUSTER automatically. This can simplify maintenance when we always only have one cluster.
Note
Medium Risk
Touches core query execution to mutate the parsed AST and inject
ON CLUSTERfor eligible statements, which could change DDL routing/behavior when enabled. Risk is mitigated by being opt-in via new experimental settings and by skipping Replicated databases and non-initial queries.Overview
Adds an experimental mode to automatically fill a missing
ON CLUSTERclause from a configured default cluster.Introduces new settings
allow_experimental_automatic_fill_on_cluster_modeandcluster_for_automatic_fill_mode, updatesexecuteQueryto inject the cluster intoASTQueryWithOnClusterwhen enabled (but skips queries targetingReplicateddatabases), and adds stateless tests to verify both the injection behavior and the Replicated-database exclusion.Reviewed by Cursor Bugbot for commit 9829571. Bugbot is set up for automated code reviews on this repo. Configure here.