Fix settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY by groeneai · Pull Request #105111 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY#105111

Open
groeneai wants to merge 21 commits into
ClickHouse:masterfrom
groeneai:fix-104463-alter-settings-simple-aggregate
Open

Fix settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY#105111
groeneai wants to merge 21 commits into
ClickHouse:masterfrom
groeneai:fix-104463-alter-settings-simple-aggregate

Conversation

@groeneai

@groeneai groeneai commented May 16, 2026

Copy link
Copy Markdown
Contributor

StorageMergeTree::alter and StorageReplicatedMergeTree::alter re-ran
MergeTreeData::verifySortingKey on (almost) every ALTER when
allow_suspicious_primary_key was off. As a result, unrelated ALTERs
(settings-only, comment-only, codec changes, ADD COLUMN of a non-key column,
column placement modifiers, …) failed with DATA_TYPE_CANNOT_BE_USED_IN_KEY on
tables that had been created with allow_suspicious_primary_key = 1 once that
session-level setting was no longer in effect:

SET allow_suspicious_primary_key = 1;
CREATE TABLE t (key Int, value SimpleAggregateFunction(sum, UInt64))
ENGINE = AggregatingMergeTree() ORDER BY (key, value);

SET allow_suspicious_primary_key = 0;
ALTER TABLE t MODIFY SETTING index_granularity = 8192;
-- Code: 549. DB::Exception: Column with type
-- SimpleAggregateFunction(sum, UInt64) is not allowed in key expression.

The property that actually matters is whether the resolved sorting key changed,
not which ALTER shape was used. Earlier attempts that whitelisted specific
"metadata-only" ALTER shapes kept missing cases (mixed MODIFY COMMENT +
MODIFY SETTING, comment-only MODIFY COLUMN, placement modifiers, codec
changes, per-column MODIFY SETTING, …). This PR replaces the whitelist with a
direct comparison.

A new helper MergeTreeData::sortingKeyChanged compares the resolved sorting
key of the old and new metadata — both its key column/expression list
(KeyDescription::column_names) and the printed names of its data types
(KeyDescription::data_types). verifySortingKey now runs only when
allow_suspicious_primary_key is off and the sorting key actually changed.
The check is applied at the same position in both StorageMergeTree::alter and
StorageReplicatedMergeTree::alter, so the two engines now agree.

Comparing the type names (rather than IDataType::equals, which only checks
underlying-type identity) is required so that retyping a key column to a
suspicious type (e.g. Int32 -> SimpleAggregateFunction(any, Int32)) is still
detected and rejected. Comparing the column list is required so that swapping a
key column for another column of the same type (MODIFY ORDER BY) is also
detected.

A stateless regression test
(04417_104463_alter_settings_simple_aggregate_func_order_by) covers the
accepted ALTERs (MODIFY SETTING, RESET SETTING, MODIFY COMMENT,
COMMENT COLUMN, mixed settings/comment statements, comment/codec/placement
MODIFY COLUMN, per-column MODIFY SETTING, ADD COLUMN of a non-key column —
on MergeTree, AggregatingMergeTree, and ReplicatedAggregatingMergeTree)
and the rejected ALTERs that genuinely change the sorting key (retyping a key
column to a suspicious type, swapping a key column via MODIFY ORDER BY). Users
can still opt in with allow_suspicious_primary_key = 1.

Closes: #104463

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix an unexpected DATA_TYPE_CANNOT_BE_USED_IN_KEY error on ALTER queries that cannot change the sorting key (settings, comments, codecs, adding a non-key column, etc.) for MergeTree tables that contain a SimpleAggregateFunction (or another type allowed only with allow_suspicious_primary_key = 1) in the sorting key. The sorting key is now re-verified only for ALTERs that can actually change it, matching the existing ReplicatedMergeTree behavior.

Documentation entry for user-facing changes

  • Documentation is unchanged

…ORDER BY

`StorageMergeTree::alter` invoked `verifySortingKey` unconditionally before
the early-return branches for `isSettingsAlter` and `isCommentAlter`. That
caused unrelated ALTERs to fail with `DATA_TYPE_CANNOT_BE_USED_IN_KEY` on
tables that were originally created with `allow_suspicious_primary_key = 1`
once that session-level setting was no longer in effect, for instance:

    SET allow_suspicious_primary_key = 1;
    CREATE TABLE t (key Int, value SimpleAggregateFunction(sum, UInt64))
    ENGINE = AggregatingMergeTree() ORDER BY (key, value);

    SET allow_suspicious_primary_key = 0;
    ALTER TABLE t MODIFY SETTING index_granularity = 8192;
    -- Code: 549. DB::Exception: Column with type
    -- SimpleAggregateFunction(sum, UInt64) is not allowed in key expression.

`StorageReplicatedMergeTree::alter` already runs the verification after the
settings-only and comment-only early returns, so the two engines disagree.

Move the call into the same position in `StorageMergeTree::alter`. Settings
and comment ALTERs cannot change the sorting key, so re-verifying it there
is unnecessary; ALTERs that can change the key still verify it.

A stateless regression test in `04248_104463_*` covers `MODIFY SETTING`,
`RESET SETTING`, `MODIFY COMMENT`, `COMMENT COLUMN`, the rejected
`ADD COLUMN` path, and the same `ADD COLUMN` succeeding when the user
re-enables `allow_suspicious_primary_key`.

Closes ClickHouse#104463
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (worker)

(a) Deterministic repro? Yes — running the exact ALTER TABLE t MODIFY SETTING index_granularity = 8192; from the issue against an unmodified build reliably triggers Code: 549 DATA_TYPE_CANNOT_BE_USED_IN_KEY.

(b) Root cause explained? StorageMergeTree::alter invoked MergeTreeData::verifySortingKey unconditionally before the if (commands.isSettingsAlter()) ... else if (commands.isCommentAlter()) ... else ... chain. For tables originally created with allow_suspicious_primary_key = 1 but altered in a session that has it disabled, that verification rejected the new metadata's sorting key even though the ALTER does not touch the key. StorageReplicatedMergeTree::alter calls verifySortingKey only after the settings/comment early returns, so the two engines were inconsistent.

(c) Fix matches root cause? Yes — the verifySortingKey call is moved into the else branch (i.e. only when the ALTER is neither settings-only nor comment-only). This is the exact position StorageReplicatedMergeTree::alter uses, and AlterCommand::isSettingsAlter/isCommentAlter only return true for MODIFY_SETTING/RESET_SETTING/COMMENT_* commands which cannot change the sorting key.

(d) Test intent preserved / new tests added? Existing tests are untouched. A new stateless test 04248_104463_alter_settings_simple_aggregate_func_order_by covers MODIFY SETTING, RESET SETTING, MODIFY COMMENT, COMMENT COLUMN succeeding; ADD COLUMN still rejected with DATA_TYPE_CANNOT_BE_USED_IN_KEY; and ADD COLUMN succeeding once allow_suspicious_primary_key = 1 is re-enabled — locking in both directions of the behavior.

(e) Both directions demonstrated? Yes. Locally, on a debug build of the patched branch with the fix reverted (via git stash) the new test fails immediately on the first MODIFY SETTING; with the fix restored the test passes 50/50 with full randomization.

(f) Fix is general, not a narrow patch? verifySortingKey has only two ALTER call sites in the codebase (StorageMergeTree::alter, StorageReplicatedMergeTree::alter) and they now agree. CREATE/ATTACH call sites in registerStorageMergeTree.cpp are unrelated and not affected. No defensive nullptr/early-return guards were introduced — the call simply moved to where it semantically belongs.

session: cron:clickhouse-ci-task-worker:20260516-100000

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 16, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [b81feda]

Summary:


AI Review

Summary

This PR now gates verifySortingKey on an actual resolved sorting-key change instead of a brittle whitelist of ALTER shapes. That closes the original allow_suspicious_primary_key regression and the follow-up gaps around mixed settings/comment ALTERs, metadata-only MODIFY COLUMN, same-type MODIFY ORDER BY swaps, and the replicated path. After checking the current diff, surrounding code, and the full prior review history, I did not find a remaining correctness issue in the touched code.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 16, 2026
Comment thread src/Storages/StorageMergeTree.cpp Outdated
…ting sorting key

`AlterCommands::isSettingsAlter` and `AlterCommands::isCommentAlter` are
both `all_of` checks, so a mixed non-replicated statement like

    ALTER TABLE t MODIFY COMMENT 'x', MODIFY SETTING merge_with_ttl_timeout = 60;

falls into the trailing `else` of `StorageMergeTree::alter` and still
runs `MergeTreeData::verifySortingKey`. On a table created with
`allow_suspicious_primary_key = 1` that still raises
`DATA_TYPE_CANNOT_BE_USED_IN_KEY` even though neither subcommand can
change `ORDER BY`. This is the same gap the original PR closed for the
pure-settings and pure-comment shapes; it just survives the `all_of`
predicates whenever both kinds of subcommands appear together.

Use the existing `AlterCommands::areNonReplicatedAlterCommands` helper
(`all_of (isSettingsAlter || isCommentAlter)`) to add a dedicated
metadata-only branch in both `StorageMergeTree::alter` and
`StorageReplicatedMergeTree::alter`, mirroring the existing pure-shape
branches. Statements that include a sorting-key-relevant subcommand
(e.g. `ADD COLUMN`) still skip the new branch and fall through to the
existing key-verifying path.

Extend the regression test for issue ClickHouse#104463 with mixed-command cases:

* `MODIFY COMMENT … , MODIFY SETTING …`
* `COMMENT COLUMN … , MODIFY SETTING …`
* `MODIFY COMMENT … , RESET SETTING …`
* a `MODIFY COMMENT … , ADD COLUMN …` negative case that must still be
  rejected with `DATA_TYPE_CANNOT_BE_USED_IN_KEY`.
@groeneai

Copy link
Copy Markdown
Contributor Author

Pushed commit 52c3849 addressing the mixed-command gap flagged by clickhouse-gh[bot].

Bot's concern, restated

AlterCommands::isSettingsAlter and AlterCommands::isCommentAlter are both all_of checks per individual command. A mixed statement like

ALTER TABLE t MODIFY COMMENT 'x', MODIFY SETTING merge_with_ttl_timeout = 60;

contains commands that satisfy one predicate each but not both, so both helpers return false. The mixed shape then fell through to the trailing else branch in StorageMergeTree::alter and still ran MergeTreeData::verifySortingKey, re-introducing the DATA_TYPE_CANNOT_BE_USED_IN_KEY exception that the original PR was meant to remove. The pure-settings / pure-comment branches were correctly skipping the check; the bot is right that the all_of predicates don't compose for mixed statements.

(Note: MODIFY SETTING greedily consumes a comma-separated setting list, so the bot's example syntax MODIFY SETTING X = Y, MODIFY COMMENT 'z' raises a parser error in 26.5 — but the equivalent shape MODIFY COMMENT 'z', MODIFY SETTING X = Y parses fine and exhibits the bug.)

Fix

Added a dedicated else if (commands.areNonReplicatedAlterCommands()) branch (using the existing helper which is all_of (isSettingsAlter || isCommentAlter)) in StorageMergeTree::alter. The same gap exists by symmetry in StorageReplicatedMergeTree::alter, so the equivalent branch was added there as well. Both branches perform changeSettings + setInMemoryMetadata + alterTable and skip verifySortingKey. Statements that mix in a sorting-key-relevant subcommand (e.g. ADD COLUMN) still fail areNonReplicatedAlterCommands and fall through to the unchanged key-verifying branch.

Tests

Extended tests/queries/0_stateless/04248_104463_alter_settings_simple_aggregate_func_order_by.sql with:

  • MODIFY COMMENT 'x', MODIFY SETTING merge_with_ttl_timeout = 60 (must pass on both MergeTree and AggregatingMergeTree)
  • COMMENT COLUMN value 'x', MODIFY SETTING merge_with_ttl_timeout = 45 (must pass)
  • MODIFY COMMENT 'x', RESET SETTING merge_with_ttl_timeout (must pass)
  • MODIFY COMMENT 'cmt', ADD COLUMN extra2 Int (must still throw DATA_TYPE_CANNOT_BE_USED_IN_KEY — guards against the new branch eating sorting-key-relevant ALTERs)

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes — temporarily reverted just the StorageMergeTree.cpp branch, rebuilt, ran ALTER TABLE t_mt MODIFY COMMENT 'no-fix should fail', MODIFY SETTING merge_with_ttl_timeout = 100, got Code: 549. DATA_TYPE_CANNOT_BE_USED_IN_KEY. Same query passes once the branch is restored.
b Root cause explained? commands.isSettingsAlter() and commands.isCommentAlter() are all_of predicates per individual command. A mixed statement satisfies neither, falls into the else branch, runs verifySortingKey, which throws on the suspicious SimpleAggregateFunction key.
c Fix matches root cause? Adds a branch keyed on areNonReplicatedAlterCommands (the existing helper that uses all_of (isSettingsAlter || isCommentAlter)), which exactly closes the all_of composition gap.
d Test intent preserved? Original positive (settings-only / comment-only) and negative (ADD COLUMN rejected) cases are untouched. New cases assert mixed statements succeed AND that mixed statements containing a key-relevant subcommand still fail.
e Both directions demonstrated? Yes — see (a).
f Fix is general, not a narrow patch? The symmetric implementation in StorageReplicatedMergeTree::alter had the identical structural bug. Fixed both in this commit.

Verification

  • clickhouse-test 04248_104463_alter_settings_simple_aggregate_func_order_by: 1/1 pass
  • clickhouse-test --test-runs 50 04248_104463_alter_settings_simple_aggregate_func_order_by (randomized settings on): 50/50 pass

Session: cron:clickhouse-ci-task-worker:20260516-164500.

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

@PedroTadim

Copy link
Copy Markdown
Member

@groeneai check the CI failures, maybe another merge with master is needed

@groeneai

Copy link
Copy Markdown
Contributor Author

@PedroTadim master merged in 0a2929865c61. CI triage of the previous (c139820a3db4) run:

The only failing job was Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel), and both failing tests are chronic flakies on that job, unrelated to this PR:

Neither test touches StorageMergeTree::alter / StorageReplicatedMergeTree::alter or the ORDER BY / SimpleAggregateFunction path this PR fixes, and both fail across many unrelated PRs on the same coverage_per_test job. CI will re-run on the merged HEAD; if these two remain it's the same chronic, not a PR regression.

The mixed-command areNonReplicatedAlterCommands fix (originally 52c384966cde) survived the merge — both branches still emit the dedicated mixed MODIFY SETTING + MODIFY COMMENT / COMMENT COLUMN / RESET SETTING path in StorageMergeTree.cpp and StorageReplicatedMergeTree.cpp, and the regression test still covers all four mixed shapes (plus the negative mixed-with-ADD-COLUMN case). 50/50 randomized runs pass locally.

Comment thread src/Functions/tests/gtest_functions_stress.cpp Outdated
The 2026-05-27 master merge into this PR branch accidentally undid the
master-side changes to three files unrelated to this PR's StorageMergeTree
ALTER fix:

  - src/Functions/tests/gtest_functions_stress.cpp: re-introduced the
    per-iteration max_execution_time / CancellationChecker /
    ProcessList wiring (originally landed in ClickHouse#105146, reverted by
    ClickHouse#105163, re-landed on master). Without this, function_prop_fuzzer
    iterations cannot be interrupted and the job regresses to long/stuck
    behavior.
  - src/Interpreters/CancellationChecker.cpp: re-introduced the
    'stop_thread = false' reset at the end of workerFunction so the
    singleton can be restarted in tests.
  - tests/queries/0_stateless/03357_join_pk_sharding.sql: re-added the
    'no-tsan' tag landed by ClickHouse#105902.

Reverting these hunks restores the master versions so this PR only ships
the intended StorageMergeTree / StorageReplicatedMergeTree areNonReplicatedAlterCommands
branch and the new 04248 regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread src/Storages/StorageReplicatedMergeTree.cpp
Comment thread src/Storages/StorageMergeTree.cpp Outdated
groeneai added 3 commits May 27, 2026 15:35
`DDLWorker::taskShouldBeExecutedOnLeader` uses `ASTAlterQuery::isSettingsAlter`
and `isCommentAlter`, both of which are `isOneCommandTypeOnly` checks. A mixed
batch such as `ALTER TABLE t ON CLUSTER ... MODIFY COMMENT 'x', MODIFY SETTING
old_parts_lifetime = 123` matches neither, so the routing fell through to the
default leader-only path. The storage-side fast path in
`StorageReplicatedMergeTree::alter` for `areNonReplicatedAlterCommands` applies
the change as local metadata-only work without writing a replicated log entry,
so leader-only routing leaves the followers diverged.

Add a per-command predicate `ASTAlterQuery::isSettingsOrCommentAlter` that
returns true when every command in the batch is `MODIFY SETTING`,
`RESET SETTING`, `COMMENT COLUMN`, or `MODIFY COMMENT` (the AST-side equivalent
of `AlterCommands::areNonReplicatedAlterCommands`), and wire it into the
routing disjunction so mixed batches are executed on every replica.

Add `tests/integration/test_alter_settings_or_comment_on_cluster` covering
the mixed `MODIFY COMMENT` + `MODIFY SETTING`, mixed `MODIFY COMMENT` +
`RESET SETTING`, and `COMMENT COLUMN` + `MODIFY SETTING` shapes on a
two-replica `ReplicatedMergeTree`, asserting both replicas converge after
each ON CLUSTER call.
`MergeTreeData::checkAlterIsPossible` gates the immutable-disk restriction
with `!commands.isSettingsAlter() && !commands.isCommentAlter()`, both of
which are `std::all_of` checks on the resolved `AlterCommands`. A mixed
batch such as `MODIFY COMMENT 'x', MODIFY SETTING old_parts_lifetime = 60`
matches neither, so the check threw `SUPPORT_IS_DISABLED` even though every
command is metadata-only and the storage-side fast path in
`StorageMergeTree::alter` / `StorageReplicatedMergeTree::alter` already
handles it via `commands.areNonReplicatedAlterCommands`.

Switch the immutable-disk gate to the same `areNonReplicatedAlterCommands`
predicate so mixed metadata-only ALTERs are accepted consistently.
Per ClickHouse contribution guidelines (no em-dashes in user-facing text).
Comment thread src/Parsers/ASTAlterQuery.cpp Outdated
ALTER ... MODIFY COLUMN c COMMENT 'x' parses as ASTAlterCommand::MODIFY_COLUMN
but AlterCommand::isCommentAlter (src/Storages/AlterCommands.cpp:1211)
recognises it as comment-only when only the column comment changes. That
asymmetry made DDLWorker::taskShouldBeExecutedOnLeader route ON CLUSTER
queries leader-only while StorageReplicatedMergeTree::alter still skipped
the replicated log via areNonReplicatedAlterCommands, so followers diverged.

Extend ASTAlterQuery::isSettingsOrCommentAlter to recognise the AST shape:
a MODIFY_COLUMN command whose column declaration has only a comment set
and whose surrounding ASTAlterCommand has no remove_property, settings
changes, FIRST or AFTER. Stricter than the resolved-side check on purpose,
because any mismatch just routes the ALTER to every replica which is the
safe default.

Add test_modify_column_comment_only_on_cluster covering the new shape
standalone, mixed with MODIFY SETTING and MODIFY COMMENT, plus a negative
case with a real type change. Verified both directions locally: the test
fails when the helper is short-circuited and passes with the fix in place.
@azat azat self-assigned this Jun 2, 2026

@azat azat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply check that sorting_key has been changed?

Comment thread src/Parsers/ASTAlterQuery.cpp Outdated
Comment thread src/Storages/StorageMergeTree.cpp Outdated
@groeneai

groeneai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger — covered SHA: 287f46e3d46ed94647d4cb1a6e4626d055729f27

CI fully finished (Mergeable Check completed; 0 pending checks).

Check Test Disposition
Stateless tests (amd_tsan, parallel, 1/2) 00096_obfuscator_save_load flaky chronic trunk — task 2026-06-02-ci-00096obfuscatorsaveload-chronic-f
Stateless tests (amd_tsan, parallel, 1/2) 00175_obfuscator_schema_inference flaky chronic trunk — task 2026-06-02-ci-00175obfuscatorschemainference-ab
Unit tests (asan_ubsan) (check-level) infra/chronic UBSan UntrackedMemoryCounter — task 2026-06-03-ci-p1-chronic-ubsan-in-untrackedmemoryc
BuzzHouse (amd_debug) Logical error: Mutation of \Memory` table produced incomplete output (STID: 2380-2e8c)` chronic trunk family bug (StorageMemory APPLY PATCHES) — tracked in 2026-04-23-ci-p0-storagememorymutate-assertion-, our fix PR #105286 awaiting review/merge. NOT PR-caused (this PR only touches StorageMergeTree.cpp ALTER ordering).

No PR-caused failures. Awaiting human review (@ azat, @ alesapin, @ CurtizJ, @ KochetovNicolai).

Session: cron:our-pr-ci-monitor:20260604-213000

Replace the per-command whitelist of "metadata-only" ALTER shapes with a
direct check on whether the resolved sorting-key data types differ between
the old and new metadata. Reviewers (azat 2026-06-02 and clickhouse-gh[bot]
2026-06-03) pointed out that the prior whitelist (`isSettingsAlter`,
`isCommentAlter`, `areNonReplicatedAlterCommands`, `isSettingsOrCommentAlter`,
`isCommentOnlyModifyColumn`) kept leaking new shapes (mixed
`MODIFY SETTING + MODIFY COMMENT`, comment-only `MODIFY COLUMN`,
`MODIFY COLUMN ... COMMENT FIRST`, `MODIFY COLUMN ... CODEC(...)`,
per-column `MODIFY SETTING`). The right gate is the property the check
actually depends on: did the sorting key types change.

`MergeTreeData::sortingKeyTypesChanged` compares
`KeyDescription::data_types` element-wise via printed type names so
`SimpleAggregateFunction(any, T)` is correctly distinguished from `T`
(`IDataType::equals` only compares underlying typeids and would silently
treat them as equal). `StorageMergeTree::alter` and
`StorageReplicatedMergeTree::alter` now run `verifySortingKey` only when
the types actually changed and the user has not enabled
`allow_suspicious_primary_key`.

This collapses the previously growing predicate machinery:
- Drop `ASTAlterQuery::isSettingsOrCommentAlter` and the
  `isCommentOnlyModifyColumn` helper introduced in earlier rounds.
- Drop the storage-side `areNonReplicatedAlterCommands` mixed-form
  branches in `StorageMergeTree::alter` and
  `StorageReplicatedMergeTree::alter`. Mixed batches now go through the
  standard replicated-log path, which routes the change to every replica
  via the log entry, so the previous ON CLUSTER divergence can no longer
  occur.
- Drop the matching `isSettingsOrCommentAlter` exclusion from
  `DDLWorker::taskShouldBeExecutedOnLeader`. The pure-settings and
  pure-comment exclusions remain because `StorageReplicatedMergeTree::alter`
  still has the corresponding local-only fast paths.

The stateless regression test in
`04248_104463_alter_settings_simple_aggregate_func_order_by.sql` is
extended to cover the previously uncovered `MODIFY COLUMN` shapes and
keeps a negative case where retyping a key column to a suspicious type
must still be rejected. The integration test
`test_alter_settings_or_comment_on_cluster/test.py` is unchanged in
shape; its docstrings now describe the standard replicated-log route.

Closes: ClickHouse#104463
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. With the new test case ALTER TABLE t_104463_mt MODIFY COLUMN value CODEC(ZSTD) (and similar MODIFY COLUMN shapes that cannot affect the sorting key) on a SimpleAggregateFunction-keyed table created with allow_suspicious_primary_key = 1, the pre-fix server (commit 4cbb5fed9ad, before this rework) deterministically rejects the ALTER with Code: 549. DATA_TYPE_CANNOT_BE_USED_IN_KEY: Column with type SimpleAggregateFunction(sum, UInt64) is not allowed in key expression. After this rework, the same ALTER succeeds.
b Root cause explained? StorageMergeTree::alter and StorageReplicatedMergeTree::alter ran verifySortingKey on every ALTER that did not match a hand-rolled "metadata-only" predicate. Each round of review found a new ALTER shape the predicate did not recognise (mixed MODIFY SETTING + MODIFY COMMENT, comment-only MODIFY COLUMN, MODIFY COLUMN c COMMENT 'x' FIRST, MODIFY COLUMN c CODEC(...), MODIFY COLUMN c MODIFY SETTING ...). The actual property the check needs is whether the resolved sorting-key data types changed; the predicate was a leaky proxy.
c Fix matches root cause? Yes. MergeTreeData::sortingKeyTypesChanged compares KeyDescription::data_types element-wise by printed type name, which is exactly the property verifySortingKey inspects. The check now fires only when the new sorting-key types could observe a different verdict than at CREATE time, so any ALTER that cannot affect the resolved key types passes regardless of its command shape. The previous predicate machinery (ASTAlterQuery::isSettingsOrCommentAlter, isCommentOnlyModifyColumn, areNonReplicatedAlterCommands storage-side branches, the matching DDLWorker exclusion) is removed.
d Test intent preserved / new tests added? Yes. The stateless regression test for issue #104463 keeps every prior positive case, adds the previously uncovered MODIFY COLUMN c COMMENT 'x', MODIFY COLUMN c COMMENT 'x' / type-restate mixed, MODIFY COLUMN c CODEC(ZSTD), and MODIFY COLUMN c MODIFY SETTING ... shapes, and keeps a negative case (MODIFY COLUMN key SimpleAggregateFunction(any, Int32)) that genuinely changes the resolved key types and must still be rejected. The integration test test_alter_settings_or_comment_on_cluster is unchanged in shape; its docstrings now reflect that mixed forms route through the standard replicated log.
e Both directions demonstrated? Yes. With the prior commit 4cbb5fed9ad checked out (and this rework reverted via git stash), tests/clickhouse-test --no-random-settings 04248_104463 fails with Code: 549. DATA_TYPE_CANNOT_BE_USED_IN_KEY. With the rework restored, the same test passes 50/50 with no randomisation and 15/20 with full randomisation (the 5 failures are the unrelated Invalid time zone: zoneinfo/UTC infra flake that can hit any test).
f Fix is general, not a narrow patch? Yes. The new gate is applied symmetrically in StorageMergeTree::alter and StorageReplicatedMergeTree::alter, in the same position both files used before. Sibling code paths reviewed: MergeTreeData::checkAlterIsPossible (immutable-disk gate, line 4429) keeps commands.areNonReplicatedAlterCommands() because that predicate is a per-command "data-file-free" check serving a different purpose, not a sorting-key proxy. CREATE-time verifySortingKey callers in registerStorageMergeTree.cpp are unchanged (their call site is the one that the user is grandfathered against).

Session id: cron:clickhouse-ci-task-worker:20260605-075800

@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger — covered SHA da5e71e9

CI fully finished (Finish Workflow=SUCCESS) but the praktika Config Workflow job failed with an infrastructure transient: the can_be_tested check ran during a window when gh api orgs/ClickHouse/members returned HTTP 504 ("We couldn't respond to your request in time"). The org-members lookup couldn't verify membership and the check falsely reported 'can be tested' label is required despite the PR having the can be tested label. With the config job failed, 160 downstream checks were dropped/skipped, so this run never produced test signal.

Failure Classification Disposition
Config Workflow / Pre Hooks (HTTP 504 from gh api orgs/ClickHouse/members) infra (GitHub API transient) Not PR-caused. The PR has the can be tested label. A maintainer-triggered re-run on this HEAD would clear it.

No PR-caused failures. Per TASK.md state machine, moving the worker task to on_review for the prior body of feedback already addressed.

Session id: cron:our-pr-ci-monitor:20260605-110000

@PedroTadim

Copy link
Copy Markdown
Member

@groeneai check the CI failures.

The 2026-06-05 rework gates verifySortingKey on an actual sorting-key
type change instead of enumerating settings/comment command shapes, so
the PR no longer adds the DDLWorker routing exclusion or the storage
areNonReplicatedAlterCommands fast paths for mixed settings+comment
ALTERs. The tests/integration/test_alter_settings_or_comment_on_cluster
suite was written to verify that removed routing machinery; its tables
have no SimpleAggregateFunction key, so it never exercises the ClickHouse#104463
gate and instead asserts ON CLUSTER convergence behavior this PR does
not touch. That behavior is pre-existing master behavior, so the test
now fails on every integration batch. Remove it and restore the
original DDLWorker.cpp comment so that file is identical to master.

The ClickHouse#104463 fix (gate verifySortingKey on sortingKeyTypesChanged) stays
fully covered by the stateless test 04248_104463.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@PedroTadim CI triage for HEAD b40e8504 (now superseded by f7cb2489):

1. test_alter_settings_or_comment_on_cluster (6 integration batches) - PR-caused, now removed.
This was our own new test. The 2026-06-05 rework (gate verifySortingKey on an actual sorting-key type change, per your "just check whether sorting_key has been changed" suggestion) dropped the DDLWorker routing exclusion and the storage areNonReplicatedAlterCommands fast paths for mixed settings+comment ALTERs. After that rework the PR's DDLWorker routing is byte-identical to master. But the test was written to verify that removed routing machinery, and its tables have no SimpleAggregateFunction key, so it never exercises the #104463 gate; it just asserts ON CLUSTER convergence for MODIFY COMMENT, MODIFY SETTING old_parts_lifetime=123, which is now plain master behavior the PR no longer touches. Removed it (f7cb2489) and restored the original DDLWorker.cpp comment so that file is identical to master. The #104463 fix stays fully covered by the stateless test 04248_104463.

2. Stateless tests (amd_debug, parallel) - chronic master regression, unrelated.
Logical error: Virtual row boundary violated in MergingSortedAlgorithm (STID 2651-3359) plus the Server died / 03254_last_2_samples_aggregate_function collateral. CIDB last 7 days: 82 hits across 68 distinct PRs, 6 on master, first seen 2026-06-06. Not in any code path this PR touches (no MergingSortedAlgorithm / virtual-row code here).

The PR is now the minimal #104463 fix: sortingKeyTypesChanged helper + the gated verifySortingKey in both storage alter paths + the immutable-disk gate using the equivalent areNonReplicatedAlterCommands predicate + the stateless regression test. 6 files.

Note: the pre-existing ON CLUSTER divergence for mixed settings/comment (and comment-only MODIFY COLUMN ... SETTINGS(...)) that the bot flagged is real but is master behavior independent of this PR, so I am tracking it separately rather than expanding this fix.

groeneai added 2 commits June 10, 2026 20:39
Add explicit MODIFY COLUMN ... COMMENT FIRST and AFTER variants on the
suspicious key column, plus a column type re-statement combined with a
table MODIFY SETTING. These restate the column data type, so they bypass
the comment-only fast path and exercise the sortingKeyTypesChanged gate
directly: placement and per-column setting changes leave the resolved
sorting-key types unchanged, so verifySortingKey is skipped. On a build
without the gate they raise DATA_TYPE_CANNOT_BE_USED_IN_KEY.
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

For the regression-test update in commit 885800b (placement-modifier coverage requested by the review bot).

# Question Answer
a Deterministic repro? Yes. On a build without the gate, ALTER TABLE t MODIFY COLUMN value SimpleAggregateFunction(sum, UInt64) COMMENT 'modcol-first' FIRST (and the AFTER key / mixed-MODIFY SETTING variants) raise Code: 549 DATA_TYPE_CANNOT_BE_USED_IN_KEY every time once allow_suspicious_primary_key = 0.
b Root cause explained? The new ALTERs restate the column data type, so AlterCommand::isCommentAlter returns false (data_type != nullptr) and they bypass the comment-only fast path. They land in the else branch, which on a gate-less build always re-runs verifySortingKey. Placement modifiers and a per-statement table MODIFY SETTING do not change the resolved sorting-key data types, so the check should be skipped.
c Fix matches root cause? This is a test-coverage addition for an already-merged-into-branch gate. The new cases assert the sortingKeyTypesChanged gate is reached and correctly skips verifySortingKey for placement-modifier MODIFY COLUMN shapes, the exact form the bot flagged as previously uncovered.
d Test intent preserved / new tests added? All prior positive/negative cases unchanged. Added three positive cases (FIRST, AFTER key, type-restatement + MODIFY SETTING); the existing negative case (MODIFY COLUMN key SimpleAggregateFunction(any, Int32), a genuine key-type change) still asserts DATA_TYPE_CANNOT_BE_USED_IN_KEY.
e Both directions demonstrated? Yes. Branch HEAD (gate present): new ALTERs succeed; full test 20/20 --no-random-settings, 40/40 with randomization. Master HEAD (no gate): each new ALTER throws Code: 549.
f Fix is general, not a narrow patch? N/A (regression-test coverage, not a code change). The underlying gate is applied symmetrically in StorageMergeTree::alter and StorageReplicatedMergeTree::alter.

Session id: cron:clickhouse-worker-slot-1:20260610-202318

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on `7b5a4fcaa036`. All failures are infrastructure / chronic, none caused by this PR (which touches MergeTree ALTER/sorting-key validation):

  • `Stateless tests (arm_asan_ubsan, azure, sequential)` + `(amd_asan_ubsan, db disk, distributed plan, sequential)` — `Server died` / `Segmentation fault` (STID 0883-4f96, 1288-3bd5) / `Scraping system tables` / `01079_parallel_alter_modify_zookeeper_long`: the amd/arm asan_ubsan mass outage of 2026-06-10/11 (ASan secondary-allocator OOM). Not PR-related.
  • `Stress test (arm_asan_ubsan)` — `Hung check failed`: chronic Stress arm_asan_ubsan hung-check noise (43 PRs / 78 master, 30d). Not specific to this PR.

Ready for review.

…ttings-simple-aggregate

Resolve conflict in StorageReplicatedMergeTree::alter: master gained the
areNonReplicatedAlterCommands() local-apply block for mixed settings/comment
ALTERs (PR ClickHouse#107142). Keep it and place this PR's sortingKeyTypesChanged guard
on verifySortingKey after that block, so settings/comment-only ALTERs return
early and the suspicious-key check fires only when the sorting key types
actually change.
Comment thread src/Storages/MergeTree/MergeTreeData.cpp
The verifySortingKey gate in StorageMergeTree::alter and
StorageReplicatedMergeTree::alter compared only the resolved sorting-key
data-type names. A MODIFY ORDER BY that swaps one key column for another
of the same type (e.g. (key, value) -> (key, value2) where value2 is a
newly added SimpleAggregateFunction(sum, UInt64)) leaves the type-name
list unchanged, so the gate skipped the check and let a new suspicious
key column be introduced under allow_suspicious_primary_key = 0.

Compare the resolved key column/expression list (KeyDescription.column_names)
in addition to the data types. The data-type comparison is kept so an
in-place key-column retype (same column name, suspicious new type) still
fires. Renamed sortingKeyTypesChanged -> sortingKeyChanged accordingly.
Added a regression case to 04248 covering the column-swap bypass.

Reported by clickhouse-gh[bot] on PR ClickHouse#105111.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (update for the column-swap bypass fix)

# Question Answer
a Deterministic repro? Yes. SET allow_suspicious_primary_key=1; CREATE TABLE t (key Int32, value SimpleAggregateFunction(sum, UInt64)) ENGINE=AggregatingMergeTree PRIMARY KEY key ORDER BY (key, value); SET allow_suspicious_primary_key=0; ALTER TABLE t ADD COLUMN value2 SimpleAggregateFunction(sum, UInt64), MODIFY ORDER BY (key, value2); succeeds on the pre-fix binary every time (also on plain MergeTree).
b Root cause explained? sortingKeyTypesChanged compared only the resolved data-type names. Swapping key column value for the newly added value2 (both SimpleAggregateFunction(sum, UInt64)) keeps the type-name list [Int32, SimpleAggregateFunction(sum, UInt64)] identical, so the helper returns false, verifySortingKey is skipped, and a new suspicious key column is admitted under allow_suspicious_primary_key=0.
c Fix matches root cause? Yes. The helper now also compares KeyDescription.column_names (the resolved key column/expression list), which changes from [key, value] to [key, value2], so the gate fires. No bound widening or tag band-aid.
d Test intent preserved / new tests added? Preserved and extended. The data-type comparison is kept so the existing in-place key-retype case still errors; the #104463 skip cases are unchanged. Added a regression case to 04248 asserting the column-swap is rejected (DATA_TYPE_CANNOT_BE_USED_IN_KEY).
e Both directions demonstrated? Yes, on a Build-ID-verified binary (7a4b28c8c7). Pre-fix: swap succeeds. Post-fix: swap rejected with code 549; all #104463 skip ALTERs still succeed; in-place retype still rejected with code 549.
f Fix is general, not a narrow patch? Yes. The fix lives in the single shared helper called by both StorageMergeTree::alter and StorageReplicatedMergeTree::alter, so both engines are covered. It addresses the source (the gate's change-detection was too weak) rather than guarding the symptom.

Session id: cron:clickhouse-worker-slot-0:20260615-081200

@groeneai

Copy link
Copy Markdown
Contributor Author

CI status for 54664aa37745 (Groene AI CI monitor)

CI fully finished on this head (Finish Workflow + Config Workflow pass; no checks queued/running).

The only red is Fast test (arm_darwin):

  • 01951_distributed_push_down_limit and 01952_optimize_distributed_group_by_sharding_key (EXPLAIN-plan reference mismatch).

This is NOT caused by this PR. It is the known macOS arm_darwin loopback flake: the two tests fail across 40+ unrelated PRs in the last 7 days with 0 master failures. Root cause is leaked lo0 127.0.0.2+ aliases on the reused macOS runner making remote('127.{1,2}', ...) collapse to a local read in the EXPLAIN plan. It is already being fixed (teardown post_hook + darwin.skip for 01951/01952). Bugfix validation (functional tests) JOB passes (its internal check rows are the regression-test-on-master assertion, not a failure). Mergeable Check is red only because of the arm_darwin job.

No PR-caused failure on this head.

@groeneai

Copy link
Copy Markdown
Contributor Author

CI status — HEAD 54664aa37745

CI fully finished (Finish Workflow success, 0 pending). Every failure triaged; none PR-caused. This PR changes only MergeTree ALTER sorting-key validation (verifySortingKey / sortingKeyTypesChanged), which cannot reach the distributed-EXPLAIN, S3-keys, or SharedMergeTree paths below.

Check / test Class Disposition
Bugfix validation: Lost forever for SharedMergeTree / Lost s3 keys / S3_ERROR No such key / OOM in dmesg / Exception in test runner infra (chronic) 44 PRs, 0 master in 7d. Not PR-caused.
Fast test (arm_darwin): 01951_distributed_push_down_limit, 01952_optimize_distributed_group_by_sharding_key flaky (env) 45 PRs, 0 master in 3d. macOS loopback-alias leak tracked in #107435. Not PR-caused.

No PR-caused failures. Awaiting review.

…ttings-simple-aggregate

Resolves a context conflict in StorageReplicatedMergeTree::alter: master
introduced a named metadata_snapshot local; this branch added an old_sorting_key
snapshot used to gate verifySortingKey. Kept both.
The 104463 fix also gates verifySortingKey in StorageReplicatedMergeTree::alter,
but the stateless test only exercised the non-replicated MergeTree paths.
03020_order_by_SimpleAggregateFunction already covers the replicated rejection
(a real sorting-key change is still rejected); add the new skip path: a non-key
ALTER (MODIFY SETTING / MODIFY COMMENT / ADD COLUMN) on a grandfathered
ReplicatedAggregatingMergeTree with a SimpleAggregateFunction sorting key must
succeed under allow_suspicious_primary_key = 0. The {database} macro keeps the
ZooKeeper path unique so the test stays parallel-safe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. CREATE TABLE ... ENGINE = ReplicatedAggregatingMergeTree(...) ORDER BY (key, value) with a SimpleAggregateFunction key under allow_suspicious_primary_key = 1, then a non-key ALTER ... MODIFY SETTING under allow_suspicious_primary_key = 0.
b Root cause explained? StorageReplicatedMergeTree::alter previously called verifySortingKey unconditionally when allow_suspicious_primary_key was off, so any ALTER on a grandfathered suspicious-key table re-validated the unchanged key and was rejected. The PR gates that call behind sortingKeyChanged. The non-replicated paths were tested; the replicated skip path was not.
c Fix matches root cause? This is a test-coverage addition for the already-merged fix. The new case directly exercises the gated verifySortingKey call in StorageReplicatedMergeTree::alter.
d Test intent preserved / new tests added? Added a ReplicatedAggregatingMergeTree case to 04248_104463_*. No existing assertion weakened. 03020_order_by_SimpleAggregateFunction still covers the replicated rejection direction.
e Both directions demonstrated? With fix: the replicated non-key ALTERs succeed (verified against branch binary 7a4b28c8c7 with embedded Keeper; whole test file outputs 1\n1\n1). Without fix: the removed pre-fix line was if (!allow_suspicious_primary_key) verifySortingKey(...) unconditional, which rejects the same ALTERs with DATA_TYPE_CANNOT_BE_USED_IN_KEY (this is the exact rejection 03020 proves on the key-change side).
f Fix is general, not a narrow patch? N/A (test-only change). The underlying fix is shared between StorageMergeTree::alter and StorageReplicatedMergeTree::alter via the single MergeTreeData::sortingKeyChanged helper; both now have stateless coverage.

Session id: cron:clickhouse-worker-slot-0:20260621-075100

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on HEAD 7640b5fb8ec0 (test-only commit covering the Replicated skip path in the 04248 regression test). Both failing jobs are pre-existing CI flakes, not caused by this PR (which only reorders verifySortingKey past the settings/comment/non-sorting-key ALTER branches in the MergeTree/ReplicatedMergeTree alter() paths):

No PR-caused failures. (CH Inc sync PENDING is the fork-sync check, not a CI test result.)

Session: cron:our-pr-ci-monitor:20260621-133000

The master merge brought in `04248_97771_lazy_materialization_alias_invalid_chunk`,
which shares the `04248` numeric prefix with this PR's regression test. Renumber
this PR's test to the next free prefix (`04498`) to keep test numbers unique.
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged current master (branch was ~12 days behind) and pushed to re-run CI.

One conflict-free follow-up was needed: the master merge introduced 04248_97771_lazy_materialization_alias_invalid_chunk, which collided on the 04248 numeric prefix with this PR's regression test. Renumbered our test 04248_104463_… → 04498_104463_… (next free prefix) to keep test numbers unique; contents unchanged.

Verified the three changed translation units (StorageMergeTree.cpp, StorageReplicatedMergeTree.cpp, MergeTreeData.cpp) still compile against the merged tree (-fsyntax-only, exit 0) — no API drift. Confirmed the bug still reproduces on current master (the unconditional verifySortingKey is still there pre-fix) and issue #104463 is still open, so the fix remains needed.

The two reds on the previous head (7640b5fb) were infra/flaky and unrelated to this MergeTree-ALTER change:

  • Integration tests (amd_asan_ubsan, db disk, old analyzer, 5/6) — job ERROR at 0s (never ran any test).
  • Stress test (arm_tsan)SOCKET_TIMEOUT on SYSTEM STOP DISTRIBUTED SENDS (stress-harness timeout).

Both will re-run on the new head.

@clickhouse-gh

clickhouse-gh Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 31/34 (91.18%) · Uncovered code

Full report · Diff report

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

CI finish ledger — b81feda

CI fully finished (Finish Workflow + Mergeable Check pass). All checks green on this HEAD; only the private ClickHouse-Inc sync mirror is non-green.

Check / test Reason Owner / fixing PR
CH Inc sync private fork-sync mirror CH Inc sync (private, not actionable)

Session id: cron:our-pr-ci-monitor:20260703-053000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Settings-only ALTER on MergeTree with SimpleAggregateFunction in ORDER BY fails unexpectedly

4 participants