Format operators as functions in EXPLAIN SYNTAX and toAST by 1abdelhalim · Pull Request #94681 · ClickHouse/ClickHouse · GitHub
Skip to content

Format operators as functions in EXPLAIN SYNTAX and toAST#94681

Merged
alexey-milovidov merged 55 commits into
ClickHouse:masterfrom
1abdelhalim:fix-94603-operators-as-functions
Jun 13, 2026
Merged

Format operators as functions in EXPLAIN SYNTAX and toAST#94681
alexey-milovidov merged 55 commits into
ClickHouse:masterfrom
1abdelhalim:fix-94603-operators-as-functions

Conversation

@1abdelhalim

@1abdelhalim 1abdelhalim commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Resolves #94603

EXPLAIN SYNTAX and the analyzer explain formatting paths now print operators as function calls (for example plus(1, 2) instead of 1 + 2) by formatting AST output with allow_operators = false in InterpreterExplainQuery.

The change also introduces ConvertToASTOptions.use_source_expression_for_constants and uses it only for EXPLAIN conversion paths to preserve expected constant rendering without changing unrelated toAST callers.

Context: #94681 (comment)

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

EXPLAIN SYNTAX now formats operators as function calls consistently in explain output (for example plus(1, 2) instead of 1 + 2).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

No documentation changes needed. This is an output-formatting consistency change in EXPLAIN output and does not change query semantics.


Note

Low Risk
Primarily changes EXPLAIN/toAST formatting behavior and golden test expectations; low risk to query semantics but may affect user-facing explain output and tooling that parses it.

Overview
Changes EXPLAIN formatting for analyzer-backed paths. InterpreterExplainQuery now formats query-tree-derived AST with an explicit allow_operators setting (to distinguish EXPLAIN SYNTAX vs EXPLAIN QUERY TREE output), and routes formatting through the extended toAST options.

Adds constant rendering control in query-tree → AST conversion. Introduces ConvertToASTOptions.use_source_expression_for_constants and updates ConstantNode::toASTImpl to emit the original source_expression when requested, preserving expected literal/constant appearance in explain output.

Updates tests to the new canonical explain strings. Numerous stateless query references and analyzer pass tests are adjusted to match the new operator/function-call formatting and several .sql tests explicitly enable the analyzer.

Reviewed by Cursor Bugbot for commit 16e0d24. Bugbot is set up for automated code reviews on this repo. Configure here.

Version info

  • Merged into: 26.6.1.756

@CLAassistant

CLAassistant commented Jan 20, 2026

Copy link
Copy Markdown

@1abdelhalim 1abdelhalim force-pushed the fix-94603-operators-as-functions branch from ed887dd to 072c87c Compare January 20, 2026 17:18
@rschu1ze rschu1ze added the can be tested Allows running workflows for external contributors label Jan 20, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [5bd9444]

Summary:


AI Review

Summary

This PR makes EXPLAIN SYNTAX render operator expressions in canonical function form through FormatStateStacked::allow_operators = false, and scopes ConstantNode source-expression rendering to the EXPLAIN SYNTAX query-tree conversion path so EXPLAIN QUERY TREE dump_ast = 1 keeps showing the post-pass tree. The previous inline concerns about -- { echo } reference drift, EXPLAIN QUERY TREE semantics, and the misleading comment are fixed in the current code; I did not find any new correctness, safety, or compatibility issues that warrant inline comments.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Jan 20, 2026
Comment thread tests/queries/0_stateless/03800_explain_syntax_operators_as_functions.sql Outdated
@rschu1ze

Copy link
Copy Markdown
Member

The test to code ratio is 99:1. I like that.

@rschu1ze

Copy link
Copy Markdown
Member

@novikd Like to check? (I'm not too familiar with the analyzer internals)

@novikd novikd self-assigned this Jan 20, 2026
@rschu1ze

Copy link
Copy Markdown
Member

@1abdelhalim The next step is to go through all test failures and fix them, thanks.

@1abdelhalim 1abdelhalim force-pushed the fix-94603-operators-as-functions branch 3 times, most recently from 4adca3d to f0d81b3 Compare January 20, 2026 23:03

@novikd novikd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the problem is that it won't pass the tests with disabled analyzer.

Comment thread src/Analyzer/FunctionNode.cpp Outdated
This implements the core fix for issue ClickHouse#94603 by modifying
FunctionNode::toASTImpl() to always format operators as function calls,
and adds comprehensive tests.
Update all stateless test reference files to reflect the new operator
formatting behavior where operators are displayed as function calls
(e.g., plus(a, b) instead of a + b) in EXPLAIN SYNTAX and toAST output.

This includes updates to both .reference and .oldanalyzer.reference files
to match the new formatting behavior across all test scenarios.
@1abdelhalim 1abdelhalim force-pushed the fix-94603-operators-as-functions branch from f5d07f7 to 4a4ffcf Compare January 24, 2026 21:39
Per reviewer feedback:
- Remove redundant 'function_ast->is_operator = false;' assignment
  (is_operator defaults to false in ASTFunction)
- Remove is_operator field from FunctionNode class since it's no longer
  needed (operators are always formatted as functions)
- Make markAsOperator() a no-op to maintain API compatibility
…nment

Restore is_operator field to maintain compatibility with existing code.
The key change remains: removed redundant 'function_ast->is_operator = false;'
assignment in toASTImpl() since is_operator defaults to false in ASTFunction.

This addresses the reviewer's main feedback while ensuring no breakage.
While is_operator defaults to false in ASTFunction, explicitly setting it
ensures correct behavior. The assignment is kept for safety and correctness.
Restore the original comments that were in the working commit to ensure
exact code match and avoid any potential issues.
Restore use_source_expression_for_constants option and related changes
that were present in the working 66-commit version but lost during cleanup.
These changes are needed for Fast test to pass.
@alexey-milovidov

Copy link
Copy Markdown
Member

@1abdelhalim, ok, more test references have to be updated.
Also, when the test does not pass with disabled analyzer, add SET enable_analyzer = 1; to the test.

Per reviewer feedback: add SET enable_analyzer = 1; to tests that don't pass
with disabled analyzer.
Fix unit tests that were failing because toAST() now formats operators
as functions. Update expected strings to use function notation:
- LIKE/ILIKE operators → like()/ilike() functions
- Comparison operators → less(), greater(), equals(), etc.
- Logical operators → and(), or(), not() functions
Fix 9 failing stateless tests by updating .reference files to convert
operators to function notation in EXPLAIN SYNTAX and EXPLAIN QUERY TREE output:

- 00597_push_down_predicate_long: Convert *, +, and = operators
- 01271_optimize_arithmetic_operations_in_aggr_func_long: Convert +, -, *, / operators (all instances)
- 01798_uniq_theta_sketch: Convert % operator and -x to negate(x)
- 02226_analyzer_or_like_combine: Convert OR operators to or() function and fix SETTINGS spacing
- 02479_mysql_connect_to_self: Convert _CAST to cast
- 02967_parallel_replicas_join_algo_and_analyzer_1/2/3: Convert _CAST and = operators
- 03255_parallel_replicas_join_algo_and_analyzer_4: Convert = operator

All operators are now formatted as functions as required by ClickHouse#94603.
- 00597_push_down_predicate_long: Convert _CAST and < operators
- 01271_optimize_arithmetic_operations_in_aggr_func_long: Convert remaining +, -, *, / operators in EXPLAIN SYNTAX output
@1abdelhalim 1abdelhalim force-pushed the fix-94603-operators-as-functions branch from dce07ce to effc6e7 Compare January 25, 2026 08:05
- 01798_uniq_theta_sketch: Convert bitNot(-x) to bitNot(negate(x))
- 02479_mysql_connect_to_self: Change cast('default', 'String') to currentDatabase() in EXPLAIN output
- 02967_parallel_replicas_join_algo_and_analyzer_1/2: Convert cast() to _CAST() in trace output
…YNTAX

The old-analyzer `Stateless tests` job compared against stale
`.oldanalyzer.reference` files.

For tests where a top-level `SET enable_analyzer = 1` was added, the old-analyzer
job now produces the same output as the new analyzer (the setting overrides the
job default), so the separate `.oldanalyzer.reference` is meaningless and can
never match — it is removed so the runner falls back to `.reference`.

For the analyzer-dependent tests that do not pin the setting
(`00826_cross_to_inner_join`, `01029_early_constant_folding`,
`02226_analyzer_or_like_combine`, `02428_parameterized_view`,
`02477_is_null_parser`), the `.oldanalyzer.reference` files were stale (they
predated the operator-to-function and redundant-parenthesis-suppression changes)
and are regenerated to match the current `EXPLAIN SYNTAX` output.
…-as-functions

# Conflicts:
#	tests/queries/0_stateless/02428_parameterized_view.reference
These changes to CI workflow / infrastructure files were unrelated to the
purpose of this pull request and were introduced accidentally (stale base
branch). Restoring them to match `master`.
@alexey-milovidov alexey-milovidov removed the manual approve Manual approve required to run CI label May 31, 2026
Comment thread src/Interpreters/InterpreterExplainQuery.cpp Outdated
alexey-milovidov and others added 4 commits May 31, 2026 09:12
… form

The new test `04105_explain_syntax_parameterized_view` (added in `master`
via ClickHouse#103249) was merged into this branch. Its `EXPLAIN SYNTAX` output prints
comparison operators, which this PR rewrites into their function form
(`number > 10` -> `greater(number, 10)`, `n = 1` -> `equals(n, 1)`). Regenerate
both the default and `.oldanalyzer` references from the built binary so they
match the function-form output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…SYNTAX

The comment claimed that `EXPLAIN SYNTAX` prefers operators over function
calls, but the code sets `allow_operators = false` for that path, so the
actual contract is the opposite: function calls are preferred over operator
syntax. Flip the wording to match the code and avoid preserving the wrong
invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… calls in function-call form

The master regression test `04241_inconsistent_ast_codec_tuple_paren_round_trip_1941_1bfa`
(STID 1941-1bfa) was added after this branch last synced and asserts the
narrower master fix, which only suppressed the redundant `parenthesized`
flag's parens for multi-argument `Function_tuple` in `allow_operators = false`
contexts (`CODEC` / `STATISTICS` / `BACKUP_NAME`).

This PR generalizes that suppression in `decideParensEmission` and
`ASTFunction::formatImplWithoutAlias` to every function call in
function-call form, so the inner `not(...)` paren is also dropped:
`not((not(tuple(...))))` now formats as `not(not(tuple(...)))`. Both forms
denote the same AST and the format-parse-format round-trip remains stable
(the round-trip check still fires `UNKNOWN_CODEC` / `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`,
not `Inconsistent AST formatting`), so this is a strictly cleaner output in the
same direction as the original fix.

Verified against the freshly built binary that every line of the test
reproduces, including the server-side round-trip assertions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master (the branch was ~687 commits behind and red) and fixed the one CI failure.

04241_inconsistent_ast_codec_tuple_paren_round_trip_1941_1bfa (Fast test). This regression test for STID 1941-1bfa was added to master after this branch last synced, so it only came in via the merge. It asserts master's narrower fix, which suppressed the redundant parenthesized-flag parens only for multi-argument Function_tuple in allow_operators = false contexts. This PR generalizes that suppression (in decideParensEmission and ASTFunction::formatImplWithoutAlias) to every function call in function-call form, so the inner not(...) paren is also dropped:

  • before: CODEC(not((not(tuple(materialize(1), materialize(2))))), ZSTD)
  • after: CODEC(not(not(tuple(materialize(1), materialize(2)))), ZSTD)

Both forms denote the same AST and the format-parse-format round-trip stays stable — verified against the freshly built binary that the round-trip assertions still fire UNKNOWN_CODEC / NUMBER_OF_ARGUMENTS_DOESNT_MATCH (not Inconsistent AST formatting), and that every formatQuerySingleLine / SHOW CREATE line reproduces. Only the reference was updated (2 lines).

Note: the test's prose comments still describe master's tuple-only mechanism. The behavioral contract it documents (round-trip stability, single-argument tuple(x) unaffected) still holds, but the comment wording is now slightly narrower than the actual generalized behavior if you'd like to update it.

The two remaining unresolved review threads are both on src/Analyzer/FunctionNode.cpp from January and are outdated — that file is no longer touched; the design moved to the ConvertToASTOptions.use_source_expression_for_constants flag as @KochetovNicolai requested.

…-as-functions

# Conflicts:
#	tests/queries/0_stateless/01029_early_constant_folding.oldanalyzer.reference
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master (the branch was ~999 commits behind) — the merge was conflict-free.

Built the binary after the merge and verified the formatting paths still hold against it:

  • 03800_explain_syntax_operators_as_functions (the new test), 01566_negate_formatting, 02006_test_positional_arguments, 02035_isNull_isNotNull_format, 01029_early_constant_folding, 00736_disjunction_optimisation, 00940_order_by_read_in_order_query_plan, 02911_analyzer_order_by_read_in_order_query_plan, 04241_inconsistent_ast_codec_tuple_paren_round_trip_1941_1bfa — all pass.
  • The EXPLAIN SYNTAX tests newly merged from master also produce the canonical function-call form: 00808_not_optimize_predicate and the optimize_inverse_dictionary_lookup family (03701/03703/04201) — e.g. equals(dictGet(...), -1) instead of dictGet(...) = -1.

All review threads are resolved and CI was green before the merge; fresh CI will run on the new commit.

@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master (the branch was 532 commits behind and red) — the merge was conflict-free.

The two red checks were the known read-in-order virtual-row flake (Virtual row boundary violated in MergingSortedAlgorithm, STID 2651-3359, and the accompanying Server died), tracked in #106664. The fix 28c01518689 (Skip virtual-row boundary check when no per-source conversion applies) was already on master but not yet in this branch, so the merge brings it in and should clear those reds.

Built the binary after the merge and verified the formatting paths against it:

  • All EXPLAIN SYNTAX tests not touched by this PR (43 of them) still pass — confirming the operators-as-functions change does not break any test added to master since the last sync. Spot-checked 02035_isNull_isNotNull_format (SELECT plus(isNotNull(1), isNotNull(2))), 02454_create_table_with_custom_disk (SETTINGS disk = disk(type = local, ...)), and the new 04309_explain_insert_format against the PR binary — output matches the references exactly.
  • All tests this PR updates pass against the PR binary, including the new 03800_explain_syntax_operators_as_functions.

(The handful of local failures during verification were all due to the minimal single-node test server lacking clusters / ZooKeeper / a second remote host / custom-disk config, plus the stale system clickhouse-client being picked up for .sh tests — none were formatting regressions.)

@clickhouse-gh

clickhouse-gh Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.60% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.20% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 48/50 (96.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov requested a review from novikd June 10, 2026 21:08

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPLAIN SYNTAX would have a slightly incorrect result, because it won't show the constant folded value.

@alexey-milovidov alexey-milovidov Jun 12, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should syntax show constant-folded values? Constant-folding is an optimization, not a syntax thing.
Gcc is often criticized for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov Because EXPLAIN SYNTAX has always been showing an optimized query. Query Tree level optimizations will still be shown here, but constant folding is not. That is a strange behavior.

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I like it.

@alexey-milovidov alexey-milovidov dismissed novikd’s stale review June 13, 2026 17:32

Time has passed.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 13, 2026
Merged via the queue into ClickHouse:master with commit 91c23c4 Jun 13, 2026
166 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026
alexey-milovidov added a commit that referenced this pull request Jun 19, 2026
…operator-as-function formatting

After merging master, `EXPLAIN SYNTAX` renders operators in function-call form
(`a OR b` -> `or(a, b)`, `s LIKE 'x'` -> `like(s, 'x')`) following #94681
(`fix-94603-operators-as-functions`). Regenerate the affected references:

- `04357_optimize_or_like_chain_non_deterministic.reference`
- `02226_analyzer_or_like_combine.reference` (also picks up benign query-tree
  node-id renumbering for one query; the tree structure is unchanged)
- `02226_analyzer_or_like_combine.oldanalyzer.reference`

Only the formatting of the `EXPLAIN SYNTAX` output changed; the rewrites and
query semantics are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alexey-milovidov added a commit to nihalzp/ClickHouse that referenced this pull request Jun 20, 2026
…master

Master PR ClickHouse#94681 made `FunctionNode::toASTImpl` always format operators as
function calls in `EXPLAIN SYNTAX` / `toAST` (e.g. `a IN b` renders as
`in(a, b)`, `a < b` as `less(a, b)`, `a AND b` as `and(a, b)`, `a NOT LIKE b`
as `notLike(a, b)`). After merging `master`, the inverse-dictionary-lookup and
`dictGetKeys` plan dumps render in the new function-call form.

Regenerated the references. The change is purely cosmetic plan formatting:
every query result and data row is byte-identical to the previous references;
only the rendering of operators inside `EXPLAIN` plans changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EXPLAIN SYNTAX and IQueryTreeNode::toAST should format all operators as functions

7 participants