Fix settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY#105111
Fix settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY#105111groeneai wants to merge 21 commits into
Conversation
…ORDER BY
`StorageMergeTree::alter` invoked `verifySortingKey` unconditionally before
the early-return branches for `isSettingsAlter` and `isCommentAlter`. That
caused unrelated ALTERs to fail with `DATA_TYPE_CANNOT_BE_USED_IN_KEY` on
tables that were originally created with `allow_suspicious_primary_key = 1`
once that session-level setting was no longer in effect, for instance:
SET allow_suspicious_primary_key = 1;
CREATE TABLE t (key Int, value SimpleAggregateFunction(sum, UInt64))
ENGINE = AggregatingMergeTree() ORDER BY (key, value);
SET allow_suspicious_primary_key = 0;
ALTER TABLE t MODIFY SETTING index_granularity = 8192;
-- Code: 549. DB::Exception: Column with type
-- SimpleAggregateFunction(sum, UInt64) is not allowed in key expression.
`StorageReplicatedMergeTree::alter` already runs the verification after the
settings-only and comment-only early returns, so the two engines disagree.
Move the call into the same position in `StorageMergeTree::alter`. Settings
and comment ALTERs cannot change the sorting key, so re-verifying it there
is unnecessary; ALTERs that can change the key still verify it.
A stateless regression test in `04248_104463_*` covers `MODIFY SETTING`,
`RESET SETTING`, `MODIFY COMMENT`, `COMMENT COLUMN`, the rejected
`ADD COLUMN` path, and the same `ADD COLUMN` succeeding when the user
re-enables `allow_suspicious_primary_key`.
Closes ClickHouse#104463
|
Pre-PR validation gate (worker) (a) Deterministic repro? Yes — running the exact (b) Root cause explained? (c) Fix matches root cause? Yes — the (d) Test intent preserved / new tests added? Existing tests are untouched. A new stateless test (e) Both directions demonstrated? Yes. Locally, on a debug build of the patched branch with the fix reverted (via (f) Fix is general, not a narrow patch? session: cron:clickhouse-ci-task-worker:20260516-100000 |
|
Workflow [PR], commit [b81feda] Summary: ✅
AI ReviewSummaryThis PR now gates Final VerdictStatus: ✅ Approve |
…ting sorting key
`AlterCommands::isSettingsAlter` and `AlterCommands::isCommentAlter` are
both `all_of` checks, so a mixed non-replicated statement like
ALTER TABLE t MODIFY COMMENT 'x', MODIFY SETTING merge_with_ttl_timeout = 60;
falls into the trailing `else` of `StorageMergeTree::alter` and still
runs `MergeTreeData::verifySortingKey`. On a table created with
`allow_suspicious_primary_key = 1` that still raises
`DATA_TYPE_CANNOT_BE_USED_IN_KEY` even though neither subcommand can
change `ORDER BY`. This is the same gap the original PR closed for the
pure-settings and pure-comment shapes; it just survives the `all_of`
predicates whenever both kinds of subcommands appear together.
Use the existing `AlterCommands::areNonReplicatedAlterCommands` helper
(`all_of (isSettingsAlter || isCommentAlter)`) to add a dedicated
metadata-only branch in both `StorageMergeTree::alter` and
`StorageReplicatedMergeTree::alter`, mirroring the existing pure-shape
branches. Statements that include a sorting-key-relevant subcommand
(e.g. `ADD COLUMN`) still skip the new branch and fall through to the
existing key-verifying path.
Extend the regression test for issue ClickHouse#104463 with mixed-command cases:
* `MODIFY COMMENT … , MODIFY SETTING …`
* `COMMENT COLUMN … , MODIFY SETTING …`
* `MODIFY COMMENT … , RESET SETTING …`
* a `MODIFY COMMENT … , ADD COLUMN …` negative case that must still be
rejected with `DATA_TYPE_CANNOT_BE_USED_IN_KEY`.
|
Pushed commit 52c3849 addressing the mixed-command gap flagged by Bot's concern, restated
ALTER TABLE t MODIFY COMMENT 'x', MODIFY SETTING merge_with_ttl_timeout = 60;contains commands that satisfy one predicate each but not both, so both helpers return false. The mixed shape then fell through to the trailing (Note: FixAdded a dedicated TestsExtended
Pre-PR validation gate
Verification
Session: |
|
This was fixed by #105146. Let's update the branch. |
|
@groeneai check the CI failures, maybe another merge with master is needed |
|
@PedroTadim master merged in The only failing job was
Neither test touches The mixed-command |
The 2026-05-27 master merge into this PR branch accidentally undid the
master-side changes to three files unrelated to this PR's StorageMergeTree
ALTER fix:
- src/Functions/tests/gtest_functions_stress.cpp: re-introduced the
per-iteration max_execution_time / CancellationChecker /
ProcessList wiring (originally landed in ClickHouse#105146, reverted by
ClickHouse#105163, re-landed on master). Without this, function_prop_fuzzer
iterations cannot be interrupted and the job regresses to long/stuck
behavior.
- src/Interpreters/CancellationChecker.cpp: re-introduced the
'stop_thread = false' reset at the end of workerFunction so the
singleton can be restarted in tests.
- tests/queries/0_stateless/03357_join_pk_sharding.sql: re-added the
'no-tsan' tag landed by ClickHouse#105902.
Reverting these hunks restores the master versions so this PR only ships
the intended StorageMergeTree / StorageReplicatedMergeTree areNonReplicatedAlterCommands
branch and the new 04248 regression test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`DDLWorker::taskShouldBeExecutedOnLeader` uses `ASTAlterQuery::isSettingsAlter` and `isCommentAlter`, both of which are `isOneCommandTypeOnly` checks. A mixed batch such as `ALTER TABLE t ON CLUSTER ... MODIFY COMMENT 'x', MODIFY SETTING old_parts_lifetime = 123` matches neither, so the routing fell through to the default leader-only path. The storage-side fast path in `StorageReplicatedMergeTree::alter` for `areNonReplicatedAlterCommands` applies the change as local metadata-only work without writing a replicated log entry, so leader-only routing leaves the followers diverged. Add a per-command predicate `ASTAlterQuery::isSettingsOrCommentAlter` that returns true when every command in the batch is `MODIFY SETTING`, `RESET SETTING`, `COMMENT COLUMN`, or `MODIFY COMMENT` (the AST-side equivalent of `AlterCommands::areNonReplicatedAlterCommands`), and wire it into the routing disjunction so mixed batches are executed on every replica. Add `tests/integration/test_alter_settings_or_comment_on_cluster` covering the mixed `MODIFY COMMENT` + `MODIFY SETTING`, mixed `MODIFY COMMENT` + `RESET SETTING`, and `COMMENT COLUMN` + `MODIFY SETTING` shapes on a two-replica `ReplicatedMergeTree`, asserting both replicas converge after each ON CLUSTER call.
`MergeTreeData::checkAlterIsPossible` gates the immutable-disk restriction with `!commands.isSettingsAlter() && !commands.isCommentAlter()`, both of which are `std::all_of` checks on the resolved `AlterCommands`. A mixed batch such as `MODIFY COMMENT 'x', MODIFY SETTING old_parts_lifetime = 60` matches neither, so the check threw `SUPPORT_IS_DISABLED` even though every command is metadata-only and the storage-side fast path in `StorageMergeTree::alter` / `StorageReplicatedMergeTree::alter` already handles it via `commands.areNonReplicatedAlterCommands`. Switch the immutable-disk gate to the same `areNonReplicatedAlterCommands` predicate so mixed metadata-only ALTERs are accepted consistently.
Per ClickHouse contribution guidelines (no em-dashes in user-facing text).
ALTER ... MODIFY COLUMN c COMMENT 'x' parses as ASTAlterCommand::MODIFY_COLUMN but AlterCommand::isCommentAlter (src/Storages/AlterCommands.cpp:1211) recognises it as comment-only when only the column comment changes. That asymmetry made DDLWorker::taskShouldBeExecutedOnLeader route ON CLUSTER queries leader-only while StorageReplicatedMergeTree::alter still skipped the replicated log via areNonReplicatedAlterCommands, so followers diverged. Extend ASTAlterQuery::isSettingsOrCommentAlter to recognise the AST shape: a MODIFY_COLUMN command whose column declaration has only a comment set and whose surrounding ASTAlterCommand has no remove_property, settings changes, FIRST or AFTER. Stricter than the resolved-side check on purpose, because any mismatch just routes the ALTER to every replica which is the safe default. Add test_modify_column_comment_only_on_cluster covering the new shape standalone, mixed with MODIFY SETTING and MODIFY COMMENT, plus a negative case with a real type change. Verified both directions locally: the test fails when the helper is short-circuited and passes with the fix in place.
azat
left a comment
There was a problem hiding this comment.
Why not simply check that sorting_key has been changed?
CI ledger — covered SHA:
|
| Check | Test | Disposition |
|---|---|---|
Stateless tests (amd_tsan, parallel, 1/2) |
00096_obfuscator_save_load |
flaky chronic trunk — task 2026-06-02-ci-00096obfuscatorsaveload-chronic-f |
Stateless tests (amd_tsan, parallel, 1/2) |
00175_obfuscator_schema_inference |
flaky chronic trunk — task 2026-06-02-ci-00175obfuscatorschemainference-ab |
Unit tests (asan_ubsan) |
(check-level) | infra/chronic UBSan UntrackedMemoryCounter — task 2026-06-03-ci-p1-chronic-ubsan-in-untrackedmemoryc |
BuzzHouse (amd_debug) |
Logical error: Mutation of \Memory` table produced incomplete output (STID: 2380-2e8c)` |
chronic trunk family bug (StorageMemory APPLY PATCHES) — tracked in 2026-04-23-ci-p0-storagememorymutate-assertion-, our fix PR #105286 awaiting review/merge. NOT PR-caused (this PR only touches StorageMergeTree.cpp ALTER ordering). |
No PR-caused failures. Awaiting human review (@ azat, @ alesapin, @ CurtizJ, @ KochetovNicolai).
Session: cron:our-pr-ci-monitor:20260604-213000
Replace the per-command whitelist of "metadata-only" ALTER shapes with a direct check on whether the resolved sorting-key data types differ between the old and new metadata. Reviewers (azat 2026-06-02 and clickhouse-gh[bot] 2026-06-03) pointed out that the prior whitelist (`isSettingsAlter`, `isCommentAlter`, `areNonReplicatedAlterCommands`, `isSettingsOrCommentAlter`, `isCommentOnlyModifyColumn`) kept leaking new shapes (mixed `MODIFY SETTING + MODIFY COMMENT`, comment-only `MODIFY COLUMN`, `MODIFY COLUMN ... COMMENT FIRST`, `MODIFY COLUMN ... CODEC(...)`, per-column `MODIFY SETTING`). The right gate is the property the check actually depends on: did the sorting key types change. `MergeTreeData::sortingKeyTypesChanged` compares `KeyDescription::data_types` element-wise via printed type names so `SimpleAggregateFunction(any, T)` is correctly distinguished from `T` (`IDataType::equals` only compares underlying typeids and would silently treat them as equal). `StorageMergeTree::alter` and `StorageReplicatedMergeTree::alter` now run `verifySortingKey` only when the types actually changed and the user has not enabled `allow_suspicious_primary_key`. This collapses the previously growing predicate machinery: - Drop `ASTAlterQuery::isSettingsOrCommentAlter` and the `isCommentOnlyModifyColumn` helper introduced in earlier rounds. - Drop the storage-side `areNonReplicatedAlterCommands` mixed-form branches in `StorageMergeTree::alter` and `StorageReplicatedMergeTree::alter`. Mixed batches now go through the standard replicated-log path, which routes the change to every replica via the log entry, so the previous ON CLUSTER divergence can no longer occur. - Drop the matching `isSettingsOrCommentAlter` exclusion from `DDLWorker::taskShouldBeExecutedOnLeader`. The pure-settings and pure-comment exclusions remain because `StorageReplicatedMergeTree::alter` still has the corresponding local-only fast paths. The stateless regression test in `04248_104463_alter_settings_simple_aggregate_func_order_by.sql` is extended to cover the previously uncovered `MODIFY COLUMN` shapes and keeps a negative case where retyping a key column to a suspicious type must still be rejected. The integration test `test_alter_settings_or_comment_on_cluster/test.py` is unchanged in shape; its docstrings now describe the standard replicated-log route. Closes: ClickHouse#104463
Pre-PR validation gate
Session id: cron:clickhouse-ci-task-worker:20260605-075800 |
CI ledger — covered SHA
|
| Failure | Classification | Disposition |
|---|---|---|
Config Workflow / Pre Hooks (HTTP 504 from gh api orgs/ClickHouse/members) |
infra (GitHub API transient) | Not PR-caused. The PR has the can be tested label. A maintainer-triggered re-run on this HEAD would clear it. |
No PR-caused failures. Per TASK.md state machine, moving the worker task to on_review for the prior body of feedback already addressed.
Session id: cron:our-pr-ci-monitor:20260605-110000
…ttings-simple-aggregate
|
@groeneai check the CI failures. |
The 2026-06-05 rework gates verifySortingKey on an actual sorting-key type change instead of enumerating settings/comment command shapes, so the PR no longer adds the DDLWorker routing exclusion or the storage areNonReplicatedAlterCommands fast paths for mixed settings+comment ALTERs. The tests/integration/test_alter_settings_or_comment_on_cluster suite was written to verify that removed routing machinery; its tables have no SimpleAggregateFunction key, so it never exercises the ClickHouse#104463 gate and instead asserts ON CLUSTER convergence behavior this PR does not touch. That behavior is pre-existing master behavior, so the test now fails on every integration batch. Remove it and restore the original DDLWorker.cpp comment so that file is identical to master. The ClickHouse#104463 fix (gate verifySortingKey on sortingKeyTypesChanged) stays fully covered by the stateless test 04248_104463. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@PedroTadim CI triage for HEAD 1. 2. The PR is now the minimal #104463 fix: Note: the pre-existing ON CLUSTER divergence for mixed settings/comment (and comment-only |
Add explicit MODIFY COLUMN ... COMMENT FIRST and AFTER variants on the suspicious key column, plus a column type re-statement combined with a table MODIFY SETTING. These restate the column data type, so they bypass the comment-only fast path and exercise the sortingKeyTypesChanged gate directly: placement and per-column setting changes leave the resolved sorting-key types unchanged, so verifySortingKey is skipped. On a build without the gate they raise DATA_TYPE_CANNOT_BE_USED_IN_KEY.
…ttings-simple-aggregate
Pre-PR validation gateFor the regression-test update in commit 885800b (placement-modifier coverage requested by the review bot).
Session id: cron:clickhouse-worker-slot-1:20260610-202318 |
|
CI finished on `7b5a4fcaa036`. All failures are infrastructure / chronic, none caused by this PR (which touches MergeTree ALTER/sorting-key validation):
Ready for review. |
…ttings-simple-aggregate Resolve conflict in StorageReplicatedMergeTree::alter: master gained the areNonReplicatedAlterCommands() local-apply block for mixed settings/comment ALTERs (PR ClickHouse#107142). Keep it and place this PR's sortingKeyTypesChanged guard on verifySortingKey after that block, so settings/comment-only ALTERs return early and the suspicious-key check fires only when the sorting key types actually change.
The verifySortingKey gate in StorageMergeTree::alter and StorageReplicatedMergeTree::alter compared only the resolved sorting-key data-type names. A MODIFY ORDER BY that swaps one key column for another of the same type (e.g. (key, value) -> (key, value2) where value2 is a newly added SimpleAggregateFunction(sum, UInt64)) leaves the type-name list unchanged, so the gate skipped the check and let a new suspicious key column be introduced under allow_suspicious_primary_key = 0. Compare the resolved key column/expression list (KeyDescription.column_names) in addition to the data types. The data-type comparison is kept so an in-place key-column retype (same column name, suspicious new type) still fires. Renamed sortingKeyTypesChanged -> sortingKeyChanged accordingly. Added a regression case to 04248 covering the column-swap bypass. Reported by clickhouse-gh[bot] on PR ClickHouse#105111. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (update for the column-swap bypass fix)
Session id: cron:clickhouse-worker-slot-0:20260615-081200 |
CI status for
|
CI status — HEAD
|
| Check / test | Class | Disposition |
|---|---|---|
| Bugfix validation: Lost forever for SharedMergeTree / Lost s3 keys / S3_ERROR No such key / OOM in dmesg / Exception in test runner | infra (chronic) | 44 PRs, 0 master in 7d. Not PR-caused. |
| Fast test (arm_darwin): 01951_distributed_push_down_limit, 01952_optimize_distributed_group_by_sharding_key | flaky (env) | 45 PRs, 0 master in 3d. macOS loopback-alias leak tracked in #107435. Not PR-caused. |
No PR-caused failures. Awaiting review.
…ttings-simple-aggregate Resolves a context conflict in StorageReplicatedMergeTree::alter: master introduced a named metadata_snapshot local; this branch added an old_sorting_key snapshot used to gate verifySortingKey. Kept both.
The 104463 fix also gates verifySortingKey in StorageReplicatedMergeTree::alter,
but the stateless test only exercised the non-replicated MergeTree paths.
03020_order_by_SimpleAggregateFunction already covers the replicated rejection
(a real sorting-key change is still rejected); add the new skip path: a non-key
ALTER (MODIFY SETTING / MODIFY COMMENT / ADD COLUMN) on a grandfathered
ReplicatedAggregatingMergeTree with a SimpleAggregateFunction sorting key must
succeed under allow_suspicious_primary_key = 0. The {database} macro keeps the
ZooKeeper path unique so the test stays parallel-safe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate
Session id: cron:clickhouse-worker-slot-0:20260621-075100 |
|
CI finished on HEAD
No PR-caused failures. ( Session: cron:our-pr-ci-monitor:20260621-133000 |
…ttings-simple-aggregate
The master merge brought in `04248_97771_lazy_materialization_alias_invalid_chunk`, which shares the `04248` numeric prefix with this PR's regression test. Renumber this PR's test to the next free prefix (`04498`) to keep test numbers unique.
|
Merged current One conflict-free follow-up was needed: the master merge introduced Verified the three changed translation units ( The two reds on the previous head (
Both will re-run on the new head. |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 31/34 (91.18%) · Uncovered code |
CI finish ledger — b81fedaCI fully finished (Finish Workflow + Mergeable Check pass). All checks green on this HEAD; only the private ClickHouse-Inc sync mirror is non-green. Session id: cron:our-pr-ci-monitor:20260703-053000 |

StorageMergeTree::alterandStorageReplicatedMergeTree::alterre-ranMergeTreeData::verifySortingKeyon (almost) everyALTERwhenallow_suspicious_primary_keywas off. As a result, unrelated ALTERs(settings-only, comment-only, codec changes,
ADD COLUMNof a non-key column,column placement modifiers, …) failed with
DATA_TYPE_CANNOT_BE_USED_IN_KEYontables that had been created with
allow_suspicious_primary_key = 1once thatsession-level setting was no longer in effect:
The property that actually matters is whether the resolved sorting key changed,
not which
ALTERshape was used. Earlier attempts that whitelisted specific"metadata-only"
ALTERshapes kept missing cases (mixedMODIFY COMMENT+MODIFY SETTING, comment-onlyMODIFY COLUMN, placement modifiers, codecchanges, per-column
MODIFY SETTING, …). This PR replaces the whitelist with adirect comparison.
A new helper
MergeTreeData::sortingKeyChangedcompares the resolved sortingkey of the old and new metadata — both its key column/expression list
(
KeyDescription::column_names) and the printed names of its data types(
KeyDescription::data_types).verifySortingKeynow runs only whenallow_suspicious_primary_keyis off and the sorting key actually changed.The check is applied at the same position in both
StorageMergeTree::alterandStorageReplicatedMergeTree::alter, so the two engines now agree.Comparing the type names (rather than
IDataType::equals, which only checksunderlying-type identity) is required so that retyping a key column to a
suspicious type (e.g.
Int32->SimpleAggregateFunction(any, Int32)) is stilldetected and rejected. Comparing the column list is required so that swapping a
key column for another column of the same type (
MODIFY ORDER BY) is alsodetected.
A stateless regression test
(
04417_104463_alter_settings_simple_aggregate_func_order_by) covers theaccepted ALTERs (
MODIFY SETTING,RESET SETTING,MODIFY COMMENT,COMMENT COLUMN, mixed settings/comment statements, comment/codec/placementMODIFY COLUMN, per-columnMODIFY SETTING,ADD COLUMNof a non-key column —on
MergeTree,AggregatingMergeTree, andReplicatedAggregatingMergeTree)and the rejected ALTERs that genuinely change the sorting key (retyping a key
column to a suspicious type, swapping a key column via
MODIFY ORDER BY). Userscan still opt in with
allow_suspicious_primary_key = 1.Closes: #104463
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix an unexpected
DATA_TYPE_CANNOT_BE_USED_IN_KEYerror onALTERqueries that cannot change the sorting key (settings, comments, codecs, adding a non-key column, etc.) forMergeTreetables that contain aSimpleAggregateFunction(or another type allowed only withallow_suspicious_primary_key = 1) in the sorting key. The sorting key is now re-verified only for ALTERs that can actually change it, matching the existingReplicatedMergeTreebehavior.Documentation entry for user-facing changes