Make row policies follow the table on RENAME, EXCHANGE and RENAME DATABASE#108478
Make row policies follow the table on RENAME, EXCHANGE and RENAME DATABASE#108478groeneai wants to merge 7 commits into
Conversation
…ABASE Row policies are keyed by (database, table). On RENAME TABLE, EXCHANGE TABLES and RENAME DATABASE the policy was left bound to the old name and never reattached, so after the operation the data was readable with no filtering under the new name, fully escaping the policy. InterpreterRenameQuery::executeToTables already re-keys table dependencies, source-view dependencies and named collections from the old StorageID to the new one, but never touched row policies. This change re-keys row policies the same way, through AccessControl, so the policy follows the data: - plain RENAME TABLE a TO b moves policies a -> b; - EXCHANGE TABLES a AND b swaps both directions (the policy follows the data, which is the only behaviour that preserves protection across the swap); - cross-database RENAME db1.a TO db2.b updates both database and table; - RENAME DATABASE moves both database-wide (ON db.*) and per-table policies; - on a failed rename the original bindings are restored in the catch block, consistent with how dependencies are already restored. The re-key is applied collision-free by routing each affected policy through a unique temporary table name first, then to its destination, so an EXCHANGE of two tables that each have a policy with the same short name does not hit a transient name collision in the access storage. Updates go through AccessControl::update by UUID, so they persist on disk and replicate like any other access-entity change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @pufit @vitlibar — could you review this? Row policies are keyed by |
|
Workflow [PR], commit [0f8ce6e] Summary: ✅
AI ReviewSummaryThis PR rekeys row policies so explicit PR Metadata💡 The selected category should be Findings❌ Blockers
Tests
Final VerdictStatus: ❌ Block Minimum required actions: close the replicated-access / |
| NamedCollectionFactory::instance().renameDependencies(to_table_id, from_table_id); | ||
|
|
||
| /// Re-key row policies last: it has its own rollback, and nothing after it can throw. | ||
| applyRowPolicyRekeys(access_control, row_policy_rekeys); |
There was a problem hiding this comment.
applyRowPolicyRekeys can still throw here after database->renameTable has already committed. For example, row policies loaded from users.xml are read-only, and a policy with the same short name can already exist on the destination table name even when the table name itself is free. In those cases the outer catch restores dependencies but cannot undo the table rename; the row policy is restored/left on the old name and the renamed table is left readable without the intended filter. executeToDatabase has the same post-commit failure mode after db->renameDatabase. Please preflight all re-key failures before the table/database rename, or block the rename before the commit point when the affected policies cannot be moved.
There was a problem hiding this comment.
The read-only and final-destination cases are preflighted now, but the post-commit exception is still reachable through the phase-1 temporary name. applyRowPolicyRekeys first updates the policy to short_name ON old_db.\.tmp_rename_row_policy__0`afterrenameTablehas returned; if that row policy name already exists,AccessControl::updatethrows there. The UUID is visible insystem.row_policies.id, so this is deterministic, not just a concurrent race. The outer catch` still cannot undo the committed table/database rename, so the policy remains on the old name and the renamed table can be exposed without the intended filter. Please preflight or reserve the temporary names as well, or avoid a post-commit rename step that can still hit an access-entity name collision.
There was a problem hiding this comment.
The current code still has the same post-commit exposure class. The snapshot/preflight at src/Interpreters/InterpreterRenameQuery.cpp:410-418 is not protected against concurrent access DDL, while applyRowPolicyRekeys still performs ordinary access-storage writes after renameTable has committed at src/Interpreters/InterpreterRenameQuery.cpp:478.
Two concrete interleavings remain possible because CREATE ROW POLICY is independent of the table DDLGuard and can target non-existing table names:
- another session creates a new policy on the source table after
collectRowPolicyRekeysbut before the rename commits; it is not inrow_policy_rekeys, so it stays bound to the old name after the table moves; - another session creates a conflicting policy on the destination or transient name after
preflightRowPolicyRekeys; the post-commitAccessControl::updatethrows,restoreputs the moving policy back on the old name, and the already-renamed table is left without that filter.
Storage-write exceptions in AccessControl::update have the same shape: preflight cannot prove the later persisted updates will succeed. This needs serialization/reservation that covers access-entity changes until the row-policy re-key is complete, or a fail-closed design that does not leave protected data at a name where the original policy is not applied when a post-commit access update fails.
The row-policy re-key in InterpreterRenameQuery previously ran after database->renameTable / db->renameDatabase had already committed. If the re-key threw at that point (a policy in a read-only storage such as one loaded from users.xml, or a destination short-name collision), the surrounding catch could restore the dependency bookkeeping but could not undo the committed table/database rename, leaving the renamed object readable without its row filter. That reintroduced the very row-policy escape this change closes, on the error path. Move the re-key feasibility check ahead of the first mutation: collect the planned re-keys and preflight them (reject the rename when a policy is in a read-only storage or its destination name is already taken by a policy that is not itself moving). A rejected rename now changes nothing. The post-rename apply step keeps its own rollback for any residual error. Add regression coverage for both unmovable-policy paths: a read-only users.xml policy (via clickhouse-local) and a destination row-policy name collision. Both must fail the rename cleanly, leaving the table in place and the policy still filtering. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the post-commit failure path in 569f980. The re-key now runs in two stages. Regression tests cover both unmovable-policy paths: a read-only |
Follow-up to the preflight commit, addressing two review findings. 1. Cross-database db.* policy. collectRowPolicyRekeys matched only an exact table_name, so a database-wide `CREATE ROW POLICY ... ON db1.*` was not carried when a table moved out of db1 (RENAME db1.t TO db2.t, or cross-db EXCHANGE). EnabledRowPolicies::getFilter then looks up db2.t and db2.*, never the old db1.*, so the moved data was readable unfiltered or under an unrelated destination policy. A db.* policy governs every table in db1 and cannot follow a single table out, so reject such cross-database moves/exchanges (NOT_IMPLEMENTED) instead of silently dropping the filter. Same-database renames are unaffected: the db.* policy keeps covering the table via the ANY_TABLE_MARK fallback. 2. Phase-1 transient name. applyRowPolicyRekeys parks each policy under `.tmp_rename_row_policy_<uuid>_<i>` before moving it to the destination, and that step runs after renameTable/renameDatabase has committed. The name is derived from the policy UUID (visible in system.row_policies.id), so a pre-existing policy can occupy it deterministically; AccessControl::update would then throw post-commit and the outer catch cannot undo the committed rename, leaving the table unfiltered. Extend preflightRowPolicyRekeys to also reject when the transient name is taken by a non-moving policy, so the whole re-key is proven applicable before the rename commits. The temp-name format is factored into tempRekeyTableName so preflight and apply cannot drift. Regression tests added to 04401 for both: cross-database RENAME with a db.* policy is rejected (table not moved, source still filtered), and a transient-name collision is rejected pre-commit (table stays put, policy still filters). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Both findings fixed in 779feb3.
Regression tests added to |
|
Pre-PR validation gate for the follow-up commit 779feb3 (two review findings). Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-3:20260625-151900 |
|
Confirmed, and this is now an atomicity question rather than a missing guard. The remaining exposure is structural to I do not want to pick the locking model unilaterally on a security PR, so flagging the fork for @pufit @vitlibar to decide:
My read: option 1 is the only complete close but is an access-subsystem change; option 3 is the smallest fail-closed option. Is fully closing this race in scope for this patch, or acceptable as a follow-up given it matches the existing post-commit re-key semantics for the other bindings? Will implement whichever you prefer. |
The regression test grew across review rounds to eight cases (RENAME, EXCHANGE, cross-DB rename, RENAME DATABASE, and the destination/read-only/cross-DB/transient-name rejection paths), two of which start a separate clickhouse-local server. A single run is cheap, but under the flaky-check job's repeated parallel runs on sanitizer builds the per-run wall clock exceeds the 180s soft cap, so the flaky check fails it with TOO_LONG. The long tag exempts the test from that 180s flaky-check cap and reduces its flaky-check repeat count from ~50 to ~5 (long_test_runs_ratio), while still running it once on every regular and per-test-coverage lane (only the LLVM aggregation and fast-test lanes pass --no-long). No assertion is weakened: each run still executes all eight security cases. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The flaky check failed this PR's own test 04401_row_policy_follows_table_rename with TOO_LONG (> 180s). The test grew to eight cases across review rounds (two start a separate clickhouse-local server), so under the flaky check's repeated parallel runs on sanitizer builds a single run crosses the 180s soft cap. Tagged the test The outstanding atomicity/TOCTOU design question from the 16:19 UTC review is unaffected and still awaits a maintainer's pick on the locking-vs-fail-closed fork. |
The row-policy re-key added for RENAME/EXCHANGE also fired for the synthetic exchange that CREATE OR REPLACE TABLE / REPLACE TABLE build internally (InterpreterCreateQuery::doCreateOrReplaceTable), where from = a freshly built temporary table and to = the existing target. The unconditional destination re-key moved the target table's row policy onto that temporary name, which is dropped right after the swap, leaving the replaced table readable with no filter. That reintroduced the row-policy escape this code closes, for replacement DDL. Mark the synthetic rename with an internal create_or_replace flag and skip the row-policy re-key for it: only the target's storage is replaced, the target keeps its name, so the policies bound to that name stay put and filter the new data. Explicit RENAME/EXCHANGE still re-key as before. The flag is rebuilt in-process on every replica via doCreateOrReplaceTable, so it is preserved for Replicated databases and ON CLUSTER. Covered by two new cases in 04401_row_policy_follows_table_rename: an existing protected table replaced via CREATE OR REPLACE TABLE and via REPLACE TABLE, each asserting the policy stays bound to the target name and keeps filtering. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Fixed in 41e138b. CREATE OR REPLACE TABLE / REPLACE TABLE reach the rename via the synthetic exchange built by I added an internal Verified on a debug build: your |
The CREATE OR REPLACE TABLE / REPLACE TABLE cases run as the restricted user and specify ENGINE = MergeTree. The default test config enables table_engines_require_grant, so they need an explicit TABLE ENGINE grant; without it the statement is rejected with ACCESS_DENIED. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Fixed the deterministic Root cause: those two cases run as the restricted test user and specify Fix: add Verified against the default test config ( |
CI finish ledger — 3f8b2ddEvery failure below has an owner: a fixing PR (ours or external), or
Session id: cron:our-pr-ci-monitor:20260626-050000 |
| /// Re-key row policies last. Preflight above already rejected the unrecoverable cases | ||
| /// (read-only storage, destination or transient-name collision); this still has its own | ||
| /// rollback for any residual error, and nothing after it can throw. | ||
| applyRowPolicyRekeys(access_control, row_policy_rekeys); |
There was a problem hiding this comment.
This is not safe for Replicated databases when the affected row policies live in ReplicatedAccessStorage. The table rename is applied by each replica's replicated-DDL worker, but this AccessControl update writes the row-policy name to the shared access storage as soon as the first replica reaches this line. A lagging replica can then receive rp ON new_name while its local table is still old_name; queries there lookup old_db.old_table and then old_db.*, so the still-old table is temporarily unfiltered until that replica processes the DDL.
Please either coordinate the row-policy move with the replicated database DDL so all replicas stay protected, keep both old and new names protected until every replica has processed the rename, or reject this operation for affected policies in replicated access storage / Replicated databases. A focused integration test with <user_directories><replicated> and a lagging database replica would cover the invariant.
There was a problem hiding this comment.
Confirmed. ReplicatedAccessStorage propagates the re-key on its own ZooKeeper path (global, immediate), while the table rename propagates through the Replicated database DDL queue (per-replica, async); the two channels are not synchronized, so a lagging replica briefly sees rp ON new_name with its local table still at old_name.
Scoping note: on master the policy does not follow the table at all, so a renamed table in a Replicated database stays unfiltered permanently. This PR reduces that to a transient window on a lagging replica only, so it does not introduce the exposure; fully closing the window is the multi-node face of the atomicity question already open at #108478 (comment).
On the fix: a fail-closed reject has to be pre-enqueue (DatabaseReplicated::checkQueryValid, next to the existing DETACH reject). A throw at this line runs inside replicated DDL execution after the entry is committed to ZooKeeper, so it cannot cleanly reject on a lagging replica. Of the three options, coordinate and dual-protect are architectural because ReplicatedAccessStorage policies are globally shared (there is no per-replica re-key and no all-replicas-caught-up barrier today); reject is the only option needing no new infrastructure but is backward-incompatible. Folding this into the same pending design decision rather than shipping a backward-incompatible reject unilaterally on a security PR.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 112/152 (73.68%) | Lost baseline coverage: none · Uncovered code |
|
📊 Cloud Performance Report ✅ AI verdict: This PR only changes RENAME TABLE / EXCHANGE / CREATE OR REPLACE DDL handling so that row policies follow the table (plus a new stateless test); the public and private diffs are identical and touch no query-execution code. The three flagged ClickBench improvements (Q4 -15.4%, Q15 -12.2%, Q28 -6.5%) exercise read-path SELECTs that never run any of the modified code, so these consistent-looking deltas are run-to-run variance, not a real effect of this change. All three have been downgraded to not-sure. No performance impact is expected from this PR. clickbenchFlagged queries (3 of 43)q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_official🟢 No significant changes Debug info
|

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a row-policy escape:
RENAME TABLE,EXCHANGE TABLESandRENAME DATABASEleft row policies bound to the old name, so after the operation the table was readable with no filtering under its new name. Row policies now follow the table/database on rename.Description
Row policies are access entities keyed by
(database, table_name)(RowPolicyName). When a table was renamed the policy stayed bound to the old name and was never reattached to the new name, so after the rename ALL rows became visible under the new name, fully escaping the policy.EXCHANGE TABLESandRENAME DATABASE(forON db.*and per-table policies) had the same effect.InterpreterRenameQuery::executeToTablesalready carefully re-keys every other kind of table binding from the oldStorageIDto the new one (referential/loading/MV dependencies, source-view dependencies, named collections), but never touched row policies. This PR re-keys row policies the same way, throughAccessControl, so the policy follows the data:RENAME TABLE a TO bmoves policiesa -> b;EXCHANGE TABLES a AND bswaps both directions. The policy follows the data, which is the only behaviour that preserves protection across the swap (otherwise exchanging a protected table with an unprotected one would move the protected data to a name with no policy);RENAME db1.a TO db2.bupdates both database and table;RENAME DATABASEmoves both database-wide (ON db.*) and per-table policies;catchblock, consistent with how dependencies are already restored.The re-key is applied collision-free by routing each affected policy through a unique temporary table name first, then to its destination, so an
EXCHANGEof two tables that each have a policy with the same short name does not hit a transient name collision in the access storage. Updates go throughAccessControl::updateby UUID, so they persist on disk and replicate like any other access-entity change.Reproduction (100% on a clean checkout, with
access_managementenabled):Regression test
04401_row_policy_follows_table_renamecovers RENAME, EXCHANGE (including same-short-name and asymmetric cases), cross-database RENAME, RENAME DATABASE, and the rollback-on-failure path.Notes for reviewers (cc access-control maintainers)
RENAME/EXCHANGEwhile active policies exist. Re-keying preserves protection without breaking legitimate workflows (e.g. tenants renaming their own tables), so it seems clearly preferable, but please confirm this is the behaviour you want.Bug Fix. This is an RBAC escape, so it may warrantCritical Bug Fix (crash, data loss, RBAC)instead. Happy to switch the category if you agree.Replicateddatabase the rename is enqueued and each replica re-runs the interpreter, so the re-key runs once per replica. Because it targets policies by UUID with absolute destination names, repeated application is idempotent; the redundant access-storage writes mirror how the existing dependency re-keying already behaves per-replica.