Fix Inconsistent AST formatting logical error on CREATE INDEX with extra-parenthesized expression by groeneai · Pull Request #109175 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix Inconsistent AST formatting logical error on CREATE INDEX with extra-parenthesized expression#109175

Open
groeneai wants to merge 4 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-index-expr-paren-round-trip
Open

Fix Inconsistent AST formatting logical error on CREATE INDEX with extra-parenthesized expression#109175
groeneai wants to merge 4 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-index-expr-paren-round-trip

Conversation

@groeneai

@groeneai groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Closes: #109163

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed an Inconsistent AST formatting logical error (server abort in debug and sanitizer builds) for CREATE INDEX / CREATE HYPOTHETICAL INDEX expressions that could not survive a format-parse-format round trip, for example CREATE HYPOTHETICAL INDEX i0 ON t0 ((a())) TYPE a and CREATE INDEX i0 ON t0 ((a, b).1) TYPE minmax.

Description

Reported in #109163. CREATE HYPOTHETICAL INDEX i0 ON t0 ((a())) TYPE a (and the plain CREATE INDEX form) raised an Inconsistent AST formatting logical error, which aborts the server in debug and sanitizer builds. There are two independent causes, both fixed here.

  1. Redundant parenthesization flag. ParserCreateIndexDeclaration consumes the index's own opening bracket and parses the inner content as an ORDER BY list. When the single index expression is itself wrapped in extra parentheses (((a())) -> inner (a())), RoundBracketsLayer sets the parenthesized flag on the resulting node. Formatting re-emits it as (a()); on re-parsing, the parser consumes those parens as the index bracket, so the re-parsed node has no flag and formats as a(). The round trip diverges by one paren level. Fix: clear the redundant top-level parenthesized flag after extracting the single expression (the index bracket already groups it).

  2. Canonical form starting with a parenthesis. A single expression whose formatted form itself starts with a ( that closes before the end ((a, b).1, (x, y) -> x, (a + b) * c) has no parenthesized flag, but formatting it bare after the index keyword lets the re-parse swallow that leading ( as the index bracket, split the inner list, and drop the trailing operator (a SYNTAX_ERROR or a diverging AST). Some of these were already broken before this PR. Fix: in ASTIndexDeclaration::formatImpl, re-wrap such single expressions in an extra (...) so the index bracket stays distinguishable. Forms whose leading ( already encloses the whole expression (a bare tuple (a, b), (expr AS alias)) are left unchanged, so multi-column index formatting is unaffected. Parentheses inside string / quoted-identifier literals are ignored by the check.

Regression test: tests/queries/0_stateless/04502_inconsistent_ast_create_index_expr_paren_round_trip.sql.

…] INDEX with extra-parenthesized expression

CREATE HYPOTHETICAL INDEX i0 ON t0 ((a())) TYPE a (and the plain CREATE INDEX
form) aborted the server with an "Inconsistent AST formatting" logical error in
debug and sanitizer builds (issue ClickHouse#109163).

ParserCreateIndexDeclaration consumes the index's own opening bracket, then parses
the inner content as an ORDER BY list. When the single index expression is itself
wrapped in extra parentheses (e.g. ((a())) -> inner (a())), RoundBracketsLayer sets
the parenthesized flag on the resulting expression node. On formatting, that flag
re-emits the parens as (a()); on re-parsing, the parser consumes those parens as the
index bracket, so the re-parsed expression has no parenthesized flag and formats as
a(). The format-parse-format round trip therefore diverges by one paren level and
trips the consistency check in executeQueryImpl.

The index bracket already groups the expression, so a top-level parenthesized flag
is redundant grouping that cannot round-trip. Clear it after extracting the single
expression. Multi-element tuples and alias expressions are unaffected: the tuple
parens are part of the value representation, and alias parens are emitted via the
formatter's need_parens path, not the parenthesized flag.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

cc @evillique @george-larionov for review, and @PedroTadim (this unblocks the fuzzer features PR).

The fix is in the shared ParserCreateIndexDeclaration: after the index's own (...) is consumed, a top-level parenthesized flag on the single index expression (from input like ((a()))) is redundant grouping that cannot survive a format-parse round trip, so it is cleared. Covers both CREATE INDEX and CREATE HYPOTHETICAL INDEX. Multi-element tuples and alias expressions are unaffected.

@PedroTadim PedroTadim added the can be tested Allows running workflows for external contributors label Jul 2, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [3c14b20]

Summary:

job_name test_name status info comment
Stress test (arm_tsan) FAIL
ThreadSanitizer: data race (STID: 1464-31ed) FAIL cidb

AI Review

Summary

This PR fixes several CREATE INDEX / CREATE HYPOTHETICAL INDEX formatter round-trip failures by stripping redundant top-level parenthesization, re-wrapping syntax-critical single-expression forms during formatting, and preserving part_of_create_index_query across clone. I did not find any remaining correctness, compatibility, or coverage issues in the current code.

Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jul 2, 2026
Comment thread src/Parsers/ParserCreateIndexQuery.cpp
…al form starts with a parenthesis

The previous blanket setParenthesized(false) in ParserCreateIndexDeclaration fixed the
redundant-parenthesization class (e.g. ((a()))), but a second, independent class still
broke the format-parse-format round trip: a single index expression whose canonical form
itself starts with a `(` that closes before the end of the expression.

Examples: (a, b).1 (tupleElement of a leading tuple), (x, y) -> x (lambda with tuple
params), (a + b) * c (leading precedence group). ParserCreateIndexDeclaration always
consumes one leading `(` as the index's own bracket, so formatting these bare lets the
re-parse swallow that `(` as the bracket, split the inner list, and drop the trailing
operator, producing a SYNTAX_ERROR or an AST that diverges on re-format. This is not a
parenthesized-flag issue, so clearing the flag cannot fix it. Some of these forms were
already broken on master before the flag change; this addresses the whole class.

Fix in ASTIndexDeclaration::formatImpl: for a single expression in a CREATE INDEX
declaration, format it, and if the result starts with a `(` whose matching `)` is not the
last character, re-wrap it in an extra `(...)` so the index bracket stays distinguishable.
Forms whose leading `(` already encloses the whole expression (a bare tuple (a, b),
(expr AS alias)) are left unchanged, so multi-column index formatting is unaffected.
Parentheses inside string / quoted-identifier literals are ignored by the check.

Extended the regression test with round-trip and concrete-output cases for the
leading-parenthesis class and non-regression cases for multi-element tuples and aliases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up (addressing the bot review): the fix now also covers the leading-parenthesis class via ASTIndexDeclaration::formatImpl. Re-validated pre-PR gate below.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. formatQuerySingleLine('CREATE INDEX i ON t ((a, b).1) TYPE minmax') yields (a, b).1, which re-parses to SYNTAX_ERROR every time. Same for (x, y) -> x, (a + b) * c.
b Root cause explained? ParserCreateIndexDeclaration always consumes one leading ( as the index bracket. When the single expression's canonical form starts with a ( that closes before the end, formatting it bare lets the re-parse swallow that ( as the bracket, split the inner list, and drop the trailing operator, so the format-parse-format round trip diverges (LOGICAL_ERROR in debug/sanitizer). This is distinct from the redundant-parenthesization-flag class.
c Fix matches root cause? Yes. ASTIndexDeclaration::formatImpl re-wraps exactly those single expressions in an extra (...) so the index bracket stays distinguishable. The parser setParenthesized(false) still handles the redundant-flag class (((a())) -> a()).
d Test intent preserved / new tests added? Yes. Regression test 04502 extended with round-trip and concrete-output cases for (a, b).1, (x, y) -> a, (a + b) * a plus non-regression cases for multi-element tuples and aliases. No assertions weakened.
e Both directions demonstrated? Yes. Verified on a master-equivalent build (fails/diverges) and on the fixed build (round-trips). ((a())) now returns UNKNOWN_FUNCTION instead of the LOGICAL_ERROR.
f Fix is general across code paths? Yes. Single choke point in formatImpl covers every single-expression leading-paren form (tupleElement, lambda, precedence group), not one symptom. The parser change and the part_of_create_index_query = false (SHOW CREATE / ALTER ADD INDEX) path are unaffected.
g Fix generalizes across inputs? Yes. Tested tupleElement, lambda, precedence-paren, string-literal-containing-paren (concat(c, ')('), ('(', b).1), multi-column tuple, and alias. The leading-paren check is quote-aware so literal parentheses do not confuse it.
h Backward compatible? Yes. Only affects formatting of standalone CREATE INDEX expressions that previously could not round-trip. Common (a, b) output is unchanged. No setting, on-disk, or wire format change.
i Invariants and contracts preserved? Yes. Restores the format(x) == format(format(x)) round-trip invariant; the semantic AST (e.g. tupleElement) is preserved; multi-column and alias formatting paths are untouched.

Session id: cron:clickhouse-worker-slot-10:20260702-133100

Comment thread src/Parsers/ASTIndexDeclaration.cpp Outdated
…EX round-trip

leadingParenClosesEarly only re-wrapped single expressions whose first `(`
closes before the end of the formatted text. A scalar subquery / VALUES
expression formats as `(SELECT ...)` with its matching `)` at the very end, so
the scan missed it: `CREATE INDEX i ON t ((SELECT 1)) TYPE minmax` formatted to
`(SELECT 1)`, and re-parsing consumed that leading `(` as the index bracket,
leaving a bare `SELECT 1` that ParserOrderByExpressionList rejects.

Key the re-wrap off the AST kind for the ASTSubquery class in addition to the
early-closing scan. Skip it when the expression already carries an alias (the
alias forces its own enclosing bracket via need_parens), so aliased subqueries
are not over-wrapped. Extends the 04502 regression test with `((SELECT 1))`,
`((VALUES (1)))`, a correlated subquery, HYPOTHETICAL, and the aliased-subquery
non-regression case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/Parsers/ASTIndexDeclaration.cpp
ASTIndexDeclaration::clone rebuilt the declaration but copied only
granularity, dropping part_of_create_index_query. That flag selects the
CREATE INDEX formatting (index bracket plus the extra wrapper this PR
restores for parenthesized expressions), so a clone that loses it formats
via the column-list branch and no longer round-trips: clone()->format()
diverged from format().

ASTHypotheticalIndexQuery::clone already worked around this by re-setting
the flag on the cloned declaration; fixing clone() itself covers that path
and any other clone site, so the workaround is removed.

Adds a gtest asserting clone()->format() == format() and a format-reparse
round trip for the parenthesized CREATE INDEX / CREATE HYPOTHETICAL INDEX /
ON CLUSTER forms.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate for the clone() fix (3c14b20), addressing the third review round.

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. gtest_Parser ParserCreateIndexQuery.ClonePreservesCreateIndexFormatting fails deterministically on the pre-fix clone(): for CREATE INDEX i ON t ((a, b).1) TYPE minmax, ast->clone()->format() yields ... i (a, b).1 ... (column-list branch) vs the original ... ((a, b).1) ....
b Root cause explained? ASTIndexDeclaration::clone() rebuilds the node via the ctor and copied only granularity, dropping part_of_create_index_query. That flag selects the CREATE INDEX formatting (index bracket + the re-wrap this PR adds), so a clone without it formats via the column-list branch and clone()->format() diverges from format().
c Fix matches root cause? Yes. The flag is copied in clone() itself, restoring the deep-copy contract at the source rather than patching one caller. The redundant re-set in ASTHypotheticalIndexQuery::clone is removed.
d Test intent preserved / new tests added? New gtest added (clone-format equality + format-reparse round trip) covering the parenthesized CREATE INDEX / HYPOTHETICAL / ON CLUSTER forms. Existing 04502 and all 101 gtest_Parser tests still pass.
e Both directions demonstrated? Yes. gtest FAILS on a pre-fix binary (verified) and PASSES with the fix. Also verified end to end on a fix-reverted server that no user-visible CREATE INDEX path regresses.
f Fix general across code paths? Yes. Fixing the base clone() covers ASTCreateIndexQuery, ASTHypotheticalIndexQuery, and any future clone site in one place. convertToASTAlterCommand still explicitly sets the flag false right after cloning, so the ALTER rewrite is unaffected.
g Fix generalizes across inputs? Yes. gtest matrix covers tupleElement (a, b).1, lambda, precedence (a + b) * a, subquery (SELECT 1), bare tuple (a, b), ON CLUSTER, and HYPOTHETICAL. All clone-format-stable.
h Backward compatible? Yes. No setting, format, or on-disk change. clone() now copies a flag it previously dropped; output only changes for AST clones that were already malformed.
i Invariants and contracts preserved? Yes. Restores the clone() deep-copy contract (a clone must format identically to the original). ASTIndexDeclaration has three state members (name, granularity, flag), all now copied. No child-ordering or tree-hash change.

Note: I could not reproduce a user-visible ON CLUSTER failure from the flag drop; CREATE INDEX ... ON CLUSTER is rewritten to ALTER TABLE ADD INDEX before distribution. Details in the review thread. The fix stands as a clone() contract correction, backed by the gtest.

Session id: cron:clickhouse-worker-slot-3:20260702-163000

@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.70% 92.70% +0.00%
Branches 77.70% 77.70% +0.00%

Changed lines: Changed C/C++ lines covered: 50/59 (84.75%) · Uncovered code

Full report · Diff report

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

CI finish ledger — 3c14b20

CI is complete on this head. No failure below is caused by this PR's diff (a Parsers AST-formatting round-trip fix + regression tests). Only the private CH Inc sync is exempt.

Check / test Reason Owner / fixing PR
Stress test (arm_tsan) / ThreadSanitizer: data race (STID 1464-31ed) — HashTableCell / hash-table aggregation flaky (data race, 3 hits / 3 unrelated PRs / 0 master in 30d) — a Parsers-only PR cannot introduce a hash-table race a fix task is created (investigating at full effort; fixing-PR link to follow here)
Mergeable Check / PR rollup of the above resolves once the above clears
CH Inc sync - CH Inc sync (private, not actionable)

Session id: cron:our-pr-ci-monitor:20260703-000000

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

CI finish ledger — 3c14b20

Every failure below has an owner. Only CH Inc sync is exempt.

Check / test Reason Owner / fixing PR
Stress test (arm_tsan) / ThreadSanitizer: data race (STID 1464-31ed) data race (UniqExactSet parallel-merge COW use_count TOCTOU on the aliased two-level-set pointee; residual after merged #108928 which relocated but did not eliminate the racy guard) a fix task is created (investigating at full effort; fixing-PR link to follow here)
Mergeable Check / PR rollup of the above owned via the above
CH Inc sync CH Inc sync (private, not actionable)

No PR-caused failure on this SHA. #109175's diff is a one-line ParserCreateIndexQuery.cpp change (clear a redundant top-level parenthesized flag on the index expression) plus a regression test; it cannot cause a uniqExact aggregate parallel-merge hash-set data race. Note: merged PR #108928 (external, Closes #108912) is ALREADY on this branch, so it does NOT resolve this race (the failure fired with the fix present) — a full-effort fix task owns the residual race instead.

Session id: cron:our-pr-ci-monitor:20260703-013000

@groeneai

groeneai commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'Inconsistent AST formatting between 'Index' and 'Index'

2 participants