Format operators as functions in EXPLAIN SYNTAX and toAST#94681
Format operators as functions in EXPLAIN SYNTAX and toAST#94681alexey-milovidov merged 55 commits into
Conversation
ed887dd to
072c87c
Compare
|
Workflow [PR], commit [5bd9444] Summary: ✅
AI ReviewSummaryThis PR makes Final VerdictStatus: ✅ Approve |
|
The test to code ratio is 99:1. I like that. |
|
@novikd Like to check? (I'm not too familiar with the analyzer internals) |
|
@1abdelhalim The next step is to go through all test failures and fix them, thanks. |
4adca3d to
f0d81b3
Compare
novikd
left a comment
There was a problem hiding this comment.
I guess the problem is that it won't pass the tests with disabled analyzer.
This implements the core fix for issue ClickHouse#94603 by modifying FunctionNode::toASTImpl() to always format operators as function calls, and adds comprehensive tests.
Update all stateless test reference files to reflect the new operator formatting behavior where operators are displayed as function calls (e.g., plus(a, b) instead of a + b) in EXPLAIN SYNTAX and toAST output. This includes updates to both .reference and .oldanalyzer.reference files to match the new formatting behavior across all test scenarios.
f5d07f7 to
4a4ffcf
Compare
Per reviewer feedback: - Remove redundant 'function_ast->is_operator = false;' assignment (is_operator defaults to false in ASTFunction) - Remove is_operator field from FunctionNode class since it's no longer needed (operators are always formatted as functions) - Make markAsOperator() a no-op to maintain API compatibility
…nment Restore is_operator field to maintain compatibility with existing code. The key change remains: removed redundant 'function_ast->is_operator = false;' assignment in toASTImpl() since is_operator defaults to false in ASTFunction. This addresses the reviewer's main feedback while ensuring no breakage.
While is_operator defaults to false in ASTFunction, explicitly setting it ensures correct behavior. The assignment is kept for safety and correctness.
Restore the original comments that were in the working commit to ensure exact code match and avoid any potential issues.
Restore use_source_expression_for_constants option and related changes that were present in the working 66-commit version but lost during cleanup. These changes are needed for Fast test to pass.
|
@1abdelhalim, ok, more test references have to be updated. |
Per reviewer feedback: add SET enable_analyzer = 1; to tests that don't pass with disabled analyzer.
Fix unit tests that were failing because toAST() now formats operators as functions. Update expected strings to use function notation: - LIKE/ILIKE operators → like()/ilike() functions - Comparison operators → less(), greater(), equals(), etc. - Logical operators → and(), or(), not() functions
Fix 9 failing stateless tests by updating .reference files to convert operators to function notation in EXPLAIN SYNTAX and EXPLAIN QUERY TREE output: - 00597_push_down_predicate_long: Convert *, +, and = operators - 01271_optimize_arithmetic_operations_in_aggr_func_long: Convert +, -, *, / operators (all instances) - 01798_uniq_theta_sketch: Convert % operator and -x to negate(x) - 02226_analyzer_or_like_combine: Convert OR operators to or() function and fix SETTINGS spacing - 02479_mysql_connect_to_self: Convert _CAST to cast - 02967_parallel_replicas_join_algo_and_analyzer_1/2/3: Convert _CAST and = operators - 03255_parallel_replicas_join_algo_and_analyzer_4: Convert = operator All operators are now formatted as functions as required by ClickHouse#94603.
- 00597_push_down_predicate_long: Convert _CAST and < operators - 01271_optimize_arithmetic_operations_in_aggr_func_long: Convert remaining +, -, *, / operators in EXPLAIN SYNTAX output
dce07ce to
effc6e7
Compare
- 01798_uniq_theta_sketch: Convert bitNot(-x) to bitNot(negate(x))
- 02479_mysql_connect_to_self: Change cast('default', 'String') to currentDatabase() in EXPLAIN output
- 02967_parallel_replicas_join_algo_and_analyzer_1/2: Convert cast() to _CAST() in trace output
…YNTAX The old-analyzer `Stateless tests` job compared against stale `.oldanalyzer.reference` files. For tests where a top-level `SET enable_analyzer = 1` was added, the old-analyzer job now produces the same output as the new analyzer (the setting overrides the job default), so the separate `.oldanalyzer.reference` is meaningless and can never match — it is removed so the runner falls back to `.reference`. For the analyzer-dependent tests that do not pin the setting (`00826_cross_to_inner_join`, `01029_early_constant_folding`, `02226_analyzer_or_like_combine`, `02428_parameterized_view`, `02477_is_null_parser`), the `.oldanalyzer.reference` files were stale (they predated the operator-to-function and redundant-parenthesis-suppression changes) and are regenerated to match the current `EXPLAIN SYNTAX` output.
…-as-functions # Conflicts: # tests/queries/0_stateless/02428_parameterized_view.reference
These changes to CI workflow / infrastructure files were unrelated to the purpose of this pull request and were introduced accidentally (stale base branch). Restoring them to match `master`.
… form The new test `04105_explain_syntax_parameterized_view` (added in `master` via ClickHouse#103249) was merged into this branch. Its `EXPLAIN SYNTAX` output prints comparison operators, which this PR rewrites into their function form (`number > 10` -> `greater(number, 10)`, `n = 1` -> `equals(n, 1)`). Regenerate both the default and `.oldanalyzer` references from the built binary so they match the function-form output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…SYNTAX The comment claimed that `EXPLAIN SYNTAX` prefers operators over function calls, but the code sets `allow_operators = false` for that path, so the actual contract is the opposite: function calls are preferred over operator syntax. Flip the wording to match the code and avoid preserving the wrong invariant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… calls in function-call form The master regression test `04241_inconsistent_ast_codec_tuple_paren_round_trip_1941_1bfa` (STID 1941-1bfa) was added after this branch last synced and asserts the narrower master fix, which only suppressed the redundant `parenthesized` flag's parens for multi-argument `Function_tuple` in `allow_operators = false` contexts (`CODEC` / `STATISTICS` / `BACKUP_NAME`). This PR generalizes that suppression in `decideParensEmission` and `ASTFunction::formatImplWithoutAlias` to every function call in function-call form, so the inner `not(...)` paren is also dropped: `not((not(tuple(...))))` now formats as `not(not(tuple(...)))`. Both forms denote the same AST and the format-parse-format round-trip remains stable (the round-trip check still fires `UNKNOWN_CODEC` / `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`, not `Inconsistent AST formatting`), so this is a strictly cleaner output in the same direction as the original fix. Verified against the freshly built binary that every line of the test reproduces, including the server-side round-trip assertions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged
Both forms denote the same AST and the format-parse-format round-trip stays stable — verified against the freshly built binary that the round-trip assertions still fire Note: the test's prose comments still describe The two remaining unresolved review threads are both on |
…-as-functions # Conflicts: # tests/queries/0_stateless/01029_early_constant_folding.oldanalyzer.reference
|
Merged Built the binary after the merge and verified the formatting paths still hold against it:
All review threads are resolved and CI was green before the merge; fresh CI will run on the new commit. |
|
Merged The two red checks were the known read-in-order virtual-row flake ( Built the binary after the merge and verified the formatting paths against it:
(The handful of local failures during verification were all due to the minimal single-node test server lacking clusters / ZooKeeper / a second |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 48/50 (96.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |
There was a problem hiding this comment.
EXPLAIN SYNTAX would have a slightly incorrect result, because it won't show the constant folded value.
There was a problem hiding this comment.
Why should syntax show constant-folded values? Constant-folding is an optimization, not a syntax thing.
Gcc is often criticized for that.
There was a problem hiding this comment.
@alexey-milovidov Because EXPLAIN SYNTAX has always been showing an optimized query. Query Tree level optimizations will still be shown here, but constant folding is not. That is a strange behavior.
…operator-as-function formatting After merging master, `EXPLAIN SYNTAX` renders operators in function-call form (`a OR b` -> `or(a, b)`, `s LIKE 'x'` -> `like(s, 'x')`) following #94681 (`fix-94603-operators-as-functions`). Regenerate the affected references: - `04357_optimize_or_like_chain_non_deterministic.reference` - `02226_analyzer_or_like_combine.reference` (also picks up benign query-tree node-id renumbering for one query; the tree structure is unchanged) - `02226_analyzer_or_like_combine.oldanalyzer.reference` Only the formatting of the `EXPLAIN SYNTAX` output changed; the rewrites and query semantics are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…master Master PR ClickHouse#94681 made `FunctionNode::toASTImpl` always format operators as function calls in `EXPLAIN SYNTAX` / `toAST` (e.g. `a IN b` renders as `in(a, b)`, `a < b` as `less(a, b)`, `a AND b` as `and(a, b)`, `a NOT LIKE b` as `notLike(a, b)`). After merging `master`, the inverse-dictionary-lookup and `dictGetKeys` plan dumps render in the new function-call form. Regenerated the references. The change is purely cosmetic plan formatting: every query result and data row is byte-identical to the previous references; only the rendering of operators inside `EXPLAIN` plans changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolves #94603
EXPLAIN SYNTAXand the analyzer explain formatting paths now print operators as function calls (for exampleplus(1, 2)instead of1 + 2) by formatting AST output withallow_operators = falseinInterpreterExplainQuery.The change also introduces
ConvertToASTOptions.use_source_expression_for_constantsand uses it only for EXPLAIN conversion paths to preserve expected constant rendering without changing unrelatedtoASTcallers.Context: #94681 (comment)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
EXPLAIN SYNTAXnow formats operators as function calls consistently in explain output (for exampleplus(1, 2)instead of1 + 2).Documentation entry for user-facing changes
No documentation changes needed. This is an output-formatting consistency change in EXPLAIN output and does not change query semantics.
Note
Low Risk
Primarily changes
EXPLAIN/toASTformatting behavior and golden test expectations; low risk to query semantics but may affect user-facing explain output and tooling that parses it.Overview
Changes
EXPLAINformatting for analyzer-backed paths.InterpreterExplainQuerynow formats query-tree-derived AST with an explicitallow_operatorssetting (to distinguishEXPLAIN SYNTAXvsEXPLAIN QUERY TREEoutput), and routes formatting through the extendedtoASToptions.Adds constant rendering control in query-tree → AST conversion. Introduces
ConvertToASTOptions.use_source_expression_for_constantsand updatesConstantNode::toASTImplto emit the originalsource_expressionwhen requested, preserving expected literal/constant appearance in explain output.Updates tests to the new canonical explain strings. Numerous stateless query references and analyzer pass tests are adjusted to match the new operator/function-call formatting and several
.sqltests explicitly enable the analyzer.Reviewed by Cursor Bugbot for commit 16e0d24. Bugbot is set up for automated code reviews on this repo. Configure here.
Version info
26.6.1.756