Fix logical error "Column identifier ... is already registered" in mutations by alexey-milovidov · Pull Request #106414 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix logical error "Column identifier ... is already registered" in mutations#106414

Merged
alexey-milovidov merged 4 commits into
masterfrom
fix-mutation-in-subquery-column-identifier
Jun 13, 2026
Merged

Fix logical error "Column identifier ... is already registered" in mutations#106414
alexey-milovidov merged 4 commits into
masterfrom
fix-mutation-in-subquery-column-identifier

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Jun 4, 2026

Copy link
Copy Markdown
Member

A mutation (lightweight DELETE, ALTER UPDATE, ALTER DELETE) whose predicate contains an IN / EXISTS subquery that reads from a default table expression nested in another subquery threw a logical error such as Column identifier dummy is already registered.

For example:

CREATE TABLE test_filter (a Int32, b Int32, c Int32) ENGINE = MergeTree ORDER BY a;
DELETE FROM test_filter WHERE exists((SELECT DISTINCT * LIMIT 452));

exists((subquery)) is rewritten to 1 IN (SELECT 1 FROM (subquery) ...), and the inner SELECT * over a FROM-less query reads system.one (column dummy). On the analyzer mutation path, buildSubqueryPlansForSetsAndAdd plans each set subquery standalone with its own GlobalPlannerContext, but the detached subquery tree carried no unique table aliases. Two distinct table expressions exposing a column with the same name both produced a bare column identifier, and GlobalPlannerContext::createColumnIdentifier threw because the identifier was already registered.

The regular planner path (addBuildSubqueriesForSetsStepIfNeeded) collects sets from an already-aliased query tree, so the collision never occurs there. The fix re-applies createUniqueAliasesIfNecessary to the detached set subquery before planning it on the mutation path.

Found by the AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=33b37ca22cfe6d153b35f57f36d8e649750e2061&name_0=MasterCI&name_1=AST%20fuzzer%20%28amd_debug%29

Closes #99931 (same STID 4697-4326), related to #100678.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a logical error Column identifier ... is already registered when a mutation (DELETE/UPDATE) predicate contains an IN/EXISTS subquery that reads from a default table expression nested in another subquery.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

Version info

  • Merged into: 26.6.1.742

…tations

A mutation (lightweight `DELETE`, `ALTER UPDATE`, `ALTER DELETE`) whose
predicate contains an `IN` / `EXISTS` subquery that reads from a default
table expression nested in another subquery threw a logical error such as
`Column identifier dummy is already registered`.

`DELETE FROM test_filter WHERE exists((SELECT DISTINCT * LIMIT 452))` is
rewritten to `1 IN (SELECT 1 FROM (SELECT DISTINCT * LIMIT 452) ...)`, where
the inner `SELECT *` reads `system.one` (column `dummy`). On the analyzer
mutation path, `buildSubqueryPlansForSetsAndAdd` plans each set subquery
standalone with its own `GlobalPlannerContext`, but the detached subquery
tree had no unique table aliases. Two distinct table expressions exposing a
column with the same name both produced a bare column identifier, and
`GlobalPlannerContext::createColumnIdentifier` threw because the identifier
was already registered.

The regular planner path (`addBuildSubqueriesForSetsStepIfNeeded`) collects
sets from an already-aliased query tree, so this never happens there. Re-apply
`createUniqueAliasesIfNecessary` to the detached set subquery before planning
it on the mutation path.

Found by the AST fuzzer:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=33b37ca22cfe6d153b35f57f36d8e649750e2061&name_0=MasterCI&name_1=AST%20fuzzer%20%28amd_debug%29

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 4, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master (was 570 commits behind and red). The merge was conflict-free; the test 04305_mutation_in_subquery_column_identifier shares only a numeric prefix with existing 04305_* tests (unique full name), so no renumbering was needed.

Empirically verified the fix end-to-end:

  • With the fix reverted, the test reproduces the reported logical error: Code: 49. DB::Exception: Column identifier dummy is already registered. (LOGICAL_ERROR).
  • With the fix in place, the test passes with the expected output (0 0 0 0 8).

Built clean after the merge.

The two red CI checks are pre-existing, unrelated master failures:

  • FunctionsStress (function_prop_fuzzer, Logical error: 'key0->size() > i') — recurs on master 1–10×/day (tracked under FunctionsStress.stress #104877).
  • 03008_s3_plain_rewritable_fault — a flaky S3 fault-injection test that also failed on master on the same day (06-04).

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master (branch was 371 commits behind and red) — clean merge, no conflicts. Rebuilt clickhouse and re-ran 04305_mutation_in_subquery_column_identifier locally; it passes (output matches the reference).

The only red is the ARM Performance Comparison shard, where both ::old and ::new of unrelated benchmarks (logical_functions_small, jit_aggregate_functions, rand) report "slower". The machine-readable report says "No significant performance changes detected" — this is benchmark instability, not a regression, and these benchmarks don't exercise the mutation/analyzer code path touched by this PR.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master into the branch (was ~1131 commits / 2 days behind and red) to get a clean CI run. No conflicts; the fix in MutationsInterpreter.cpp (re-applying createUniqueAliasesIfNecessary to the detached set subquery) is intact. Rebuilt clickhouse and the regression test 04305_mutation_in_subquery_column_identifier passes locally.

The two CI failure categories were both unrelated to this analyzer/mutation-planner change:

  • Unit tests (tsan, function_prop_fuzzer) / FunctionsStress — the keyed-SipHash assertion key0->size() > i (FunctionsHashing.h), tracked in FunctionsStress.stress #104877. This is fixed on master by 7d70c56bf81 ("Fix out-of-bounds key read in keyed SipHash over empty nested arrays"), now included via the merge.
  • Performance Comparison (arm_release)rand, cryptographic_hashes, entropy, jit_aggregate_functions flagged "slower"/"unstable"; these are perf noise on functions this PR does not touch.

@clickhouse-gh

clickhouse-gh Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.60% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.20% 77.20% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 4/4 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acceptable.

@alexey-milovidov alexey-milovidov self-assigned this Jun 13, 2026
@alexey-milovidov alexey-milovidov merged commit e95c7c9 into master Jun 13, 2026
165 of 166 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-mutation-in-subquery-column-identifier branch June 13, 2026 12:44
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
alexey-milovidov added a commit to groeneai/ClickHouse that referenced this pull request Jun 30, 2026
…ions

Per @novikd's review (ClickHouse#106025
"remove the code added in ClickHouse#106414 (keep the test)") and the unresolved
thread asking to reuse `traverseQueryTree` instead of a new visitor.

Both halves of this PR's earlier code are now redundant with `master`:

* The non-correlated `EXISTS` crash-fix (`createUniqueAliasesIfNecessary`
  on the detached subquery) landed independently on `master` via ClickHouse#106414
  (`MutationsInterpreter.cpp`).
* Correlated subqueries in mutations are already rejected with
  `NOT_IMPLEMENTED` on `master` by `correlated_subtrees.assertEmpty("in
  mutation filter"/"in mutation update")` in
  `MutationsInterpreter::prepareMutationStages`.

This PR's `ValidateNoCorrelatedSubqueryVisitor` therefore added no
behavior over `master`; `Bugfix validation` confirms the regression test
`04303_stid_4697_4326_mutation_exists_column_identifier` already passes on
the unfixed base, so every case it covers is handled by the existing
`master` code. Restore `MutationsInterpreter.cpp` to `master` and keep only
the regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: Column identifier A is already registered (STID: 4697-4326)

2 participants