Add eval table function#104396
Conversation
962cf52 to
003efb1
Compare
|
I've addressed the review comments above. The CI is still failing, but the remaining failed tests appear to be unrelated to this change. Could you please take another look when you have a moment? Thanks. |
003efb1 to
5f730c6
Compare
…n or a subquery returning single string row.
Run the normal set-operation normalization visitors on the generated `SELECT` before passing it to `StorageView`. This keeps generated `UNION ALL`, `UNION DISTINCT`, `INTERSECT`, and `EXCEPT` queries on the same path as regular queries and avoids planner exceptions from `UNION_DEFAULT`.
…ntalSettings`, add `eval` to the aspell dictionary, and make nested `eval` detection use a case-sensitive comparison.
5f730c6 to
86e74bb
Compare
Allow `StorageView` to assign a temporary table identifier for the `eval` table function before replacing it with the generated subquery. This avoids a `LOGICAL_ERROR` from `StorageView::replaceWithSubquery` for queries such as `SELECT * FROM eval(...)`.
Avoid resolving nested `eval` table function arguments on the initiator when they are used under `remote` or `cluster`. Like `view`, `eval` can generate a `SELECT` that references tables available only on shards. Reuse the `view` distributed rewrite path for table functions that expose an inner `SELECT`, and add an integration regression test for a remote-only table. Addresses reviewer feedback on ClickHouse#104396 (comment)
Permit `eval(SELECT ...)` in `QueryNormalizer` for the old analyzer path, matching the parser and `TableFunctionEval` behavior. Add a stateless regression test with `enable_analyzer` disabled. Addresses reviewer feedback on ClickHouse#104396 (comment)
Document that `eval` is normally analyzed on the initiator, but nested usage under `remote` or `cluster` is resolved on remote shards. Addresses reviewer feedback on ClickHouse#104396 (comment)
|
This was fixed by #105146. Let's update the branch. |
Drop the old `kql` table function layer left in `ExpressionListParsers.cpp` after merging `master`. The layer referenced removed `ParserKQLTableFunction` code and broke the parser build, while current `master` parses `kql` through the normal table function path. Issue: ClickHouse#101293 Tests: - `ninja -C build clickhouse_parsers` - `git diff --check -- src/Parsers/ExpressionListParsers.cpp` - `build/programs/clickhouse local --multiquery < tests/queries/0_stateless/04241_kql_array_index_no_oom.sql`
# Conflicts: # ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt
The AST fuzzer found that an `eval` argument containing a bare `SELECT` query node — e.g. a nested `eval(SELECT ...)` used as a scalar sub-expression — made `evaluateConstantQueryText` call `evaluateConstantExpression`, which in turn calls `IAST::getColumnName` on the bare `ASTSelectWithUnionQuery`. That node has no column name, so it raised `Trying to get name of not a column: SelectWithUnionQuery` (`LOGICAL_ERROR`). Reject such arguments up front with a clear `BAD_ARGUMENTS` exception. A properly wrapped scalar subquery (`ASTSubquery`) remains a valid part of a constant expression and is left untouched. CI report (AST fuzzer, amd_debug, targeted): ClickHouse#104396 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`04210_eval_table_function` and `04324_eval_table_function_udf_name` were
added assuming the new analyzer is the default and failed in the
`old analyzer` CI job:
* `04210`: `SELECT count() FROM eval('SELECT number FROM numbers(3) SETTINGS
limit = 1')` returned `3` instead of `1`, because the `limit` setting
embedded in the generated query is only scoped to the inner query under the
new analyzer. Pin that query to `enable_analyzer = 1`.
* `04324`: `SELECT eval(1) FORMAT TSVWithNames` produced the column name
`plus(1, 1)` instead of `eval(1)`, and `eval('SELECT count() FROM
numbers(eval(1))')` threw `BAD_ARGUMENTS` because the old analyzer does not
fold the SQL UDF `eval` inside a table-function argument. Make the
new-analyzer baseline explicit with `SET enable_analyzer = 1`; the
old-analyzer column name is still checked by the explicit `SET
enable_analyzer = 0` case.
Also add a regression case for the nested-argument `LOGICAL_ERROR` fixed in the
previous commit.
CI reports (Stateless tests, old analyzer):
ClickHouse#104396
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expose `ITableFunction::hasShardSideResolvedQueryArguments` and use it in distributed query analysis and serialized-plan selection instead of hard-coded `view` and `eval` checks. Mark `view`, `viewIfPermitted`, and `eval` as requiring shard-side query argument resolution, and add remote `viewIfPermitted` coverage.
Apply `ITableFunction::hasShardSideResolvedQueryArguments` in the normal analyzer path for nested table-function arguments. This keeps functions such as `viewIfPermitted` from resolving shard-only subqueries on the initiator when used under `remote`. Add a regression test for `remote` with `viewIfPermitted` and the analyzer enabled.
Reject generated `SELECT` queries with `FORMAT`, `INTO OUTFILE`, or output compression options in table function `eval`, because these top-level output options are not handled when the query is executed through `StorageView`. Add stateless coverage for `FORMAT Null` and `INTO OUTFILE` generated queries.
Apply `SETTINGS` from the query generated by `eval` while inferring the schema and executing the backing `StorageView`.
Detect table functions with shard-side resolved query arguments through table-expression arguments, so wrappers such as `loop(eval(...))` keep the generated query on the shard side.
…esHistory section After merging `master`, the development version advanced from 26.6 to 26.7. The `02995_new_settings_history` test compares against a v26.6.1 baseline and only treats a setting as documented when its `SettingsChangesHistory` entry has a version strictly greater than 26.6. The `allow_experimental_eval_table_function` entry was in the 26.6 section, so the test would report `PLEASE ADD THE NEW SETTING TO SettingsChangesHistory.cpp`. Move it to the 26.7 section to reflect the version it is actually introduced in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged the latest Post-merge fix: the development version advanced from 26.6 to 26.7 while this branch was open, and The merge itself was conflict-free. I verified all 12 changed translation units (including the new Previous CI failures were all unrelated to this PR:
CI will re-run on the updated branch. |
The shard-side wrapper skip path in `QueryAnalyzer::resolveTableFunction`
deferred a wrapper such as `loop(eval(q))` to the remote shard without first
resolving the inner `eval` table function's non-query expression arguments.
The direct case `remote(..., eval(q))` already resolves `q` on the initiator
before deferring, because the outer `WITH` alias list is stripped when the
table function AST is sent to the shard. But the wrapper case
`remote(..., loop(eval(q)))` took the `hasShardSideResolvedTableFunctionArguments`
branch and shipped `eval(q)` to the shard with `q` still unresolved, so the
shard raised `UNKNOWN_IDENTIFIER` even though direct `eval(q)` and literal
`loop(eval('...'))` are both supported.
Recursively resolve the non-query expression arguments of any `eval` nested
inside shard-side wrappers before taking the skip path, descending through the
table-expression arguments of the wrapper (so `loop(loop(eval(q)))` is handled
too). Add focused distributed regressions for `remote('remote', loop(eval(q)))`
with and without `serialize_query_plan = 1`.
Addresses the AI review finding on `src/Analyzer/Resolve/QueryAnalyzer.cpp`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review "Request changes" finding on `src/Analyzer/Resolve/QueryAnalyzer.cpp` in 82cc5e9. Problem: the shard-side wrapper skip path deferred a wrapper such as `loop(eval(q))` to the remote shard without first resolving the inner `eval`'s non-query expression arguments. The direct case `remote(..., eval(q))` already resolves `q` on the initiator (the outer `WITH` alias list is stripped when the AST is shipped to the shard), but `remote(..., loop(eval(q)))` took the `hasShardSideResolvedTableFunctionArguments` branch and sent `eval(q)` to the shard with `q` unresolved → `UNKNOWN_IDENTIFIER`. Fix: before taking the wrapper skip path, recursively resolve the non-query expression arguments of any `eval` nested inside the wrapper, descending through the wrapper's table-expression arguments (so `loop(loop(eval(q)))` is covered too). Tests: added `test_remote_loop_eval_resolves_with_alias_argument_on_initiator` and its `serialize_query_plan = 1` variant to `test_eval_table_function`. Verification (local debug build, single node acting as both initiator and shard via `remote('127.0.0.1:9100', ...)`):
CI: the only red was `Stress test (amd_tsan)`/`(arm_tsan)` → "Hung check failed, possible deadlock found", which the CI summary itself links to the open master-wide flake #107941; all builds and the rest of the suite were green. A fresh CI run is triggered by this push. |
…ce and execution The `eval` table function builds its inner `StorageView` from a generated query that may carry its own `SETTINGS enable_analyzer = ...`. The generated query's structure is inferred in `TableFunctionEval::getActualTableStructure` using that inner context (settings copied from the generated query), but `StorageView::readImpl` selected the interpreter (`InterpreterSelectQueryAnalyzer` vs `InterpreterSelectWithUnionQuery`) from the outer query's context. For a generated `SELECT ... SETTINGS enable_analyzer = 0` under an analyzer-enabled outer query, the structure was therefore inferred with the old analyzer while execution used the new analyzer. The two disagree on, for example, scalar subquery column names (`_subquery1` for the old analyzer vs `_subquery_1` for the new one), so the executed result could not be converted to the inferred structure. Build the view's execution context first and select the analyzer mode from it. For ordinary views the inner and outer settings are the same, so behaviour is unchanged; for `eval` the execution path now matches the schema-inference path. Added regression `04491_eval_table_function_analyzer_mode`. Addresses the AI review "Request changes" finding on `src/Storages/StorageView.cpp`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review "Request changes" finding on Problem: Fix: build the view's execution context first and select the analyzer mode from This is the analyzer-enabled outer path only. Under the old analyzer an Test: added The one CI red on 82cc5e9 was |
| if (!function || !function->arguments || function->arguments->children.size() <= 1) | ||
| return {}; | ||
|
|
||
| const auto * table_function_argument = function->arguments->children[1]->as<ASTFunction>(); |
There was a problem hiding this comment.
This only marks positional remote('addr', eval(...)) / cluster('cluster', eval(...)) arguments as table-expression arguments, but parseArguments also supports the named-collection override form remote(collection, database=<table_function>) and already treats database=merge(...) as a table function (see the existing 02046_remote_table_function_named_collections coverage).
In that syntax argument 1 is the equals(database, ...) wrapper, so this method returns {}. The analyzer then misses the new shard-side handling for eval / view under remote, and non-query eval arguments are not resolved before the AST is shipped to the shard. A query like WITH 'SELECT x FROM remote_only_eval_table' AS q SELECT * FROM remote(remote1, database=eval(q)) SETTINGS allow_experimental_eval_table_function = 1, enable_analyzer = 1 can still send eval(q) to the shard without the outer WITH binding and fail with UNKNOWN_IDENTIFIER, even though the positional remote(..., eval(q)) case is fixed.
Please extend the metadata/analyzer path to handle the named override table-function value as well, and add a regression for remote(named_collection, database=eval(q)) (and ideally the serialize_query_plan = 1 variant).
…c scalar subquery name) The regression added in the previous commit used an unaliased scalar subquery (`SELECT (SELECT 42)`) as the generated query's output column. Under the old analyzer the column of an unaliased subquery is named `_subquery<N>` from a global `static std::atomic_uint64_t subquery_index` in `QueryAliasesVisitor`, so the name produced while inferring the structure in `TableFunctionEval::getActualTableStructure` and the name produced again while executing the inner query in `StorageView::readImpl` are different (`_subquery102` vs another counter value). The two never match, so the test failed with `Unknown column: _subquery102, there are only columns .` (`UNKNOWN_IDENTIFIER`). This is not specific to `eval`: a plain `CREATE VIEW v AS SELECT (SELECT 42)` fails the same way under both analyzers, because view-structure inference and view read are separate interpreter runs and the subquery counter advances between them. It is therefore the wrong construct to test analyzer-mode consistency with. Replace it with a deterministic divergence that still exercises the `StorageView::readImpl` analyzer-mode fix: `SELECT 1 AS a, abs(a)`. The old analyzer substitutes the alias into the auto-generated name of the second column (`abs(1)`), while the new analyzer keeps the alias (`abs(a)`); both names are stable across runs (no global counter). With the generated query pinning `enable_analyzer = 0` under an analyzer-enabled outer query, the structure is inferred as `abs(1)`, so execution must use the old analyzer too. Without the fix `readImpl` would re-analyze with the new analyzer, produce `abs(a)`, and fail to resolve the inferred `abs(1)` column. CI report (Fast test): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104396&sha=a7c529941d71bfda429053eafb90483df6ba1f2a&name_0=PR&name_1=Fast%20test PR: ClickHouse#104396 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Fixed the Root cause: the regression I added in the previous commit used an unaliased scalar subquery ( Fix: replaced it with a deterministic divergence that still exercises the Remaining AI-review finding ( |
`ApplyWithSubqueryVisitor` substituted `WITH` literal aliases into the arguments of every `ASTFunction` named `eval`, keyed only by the function name. Under the old analyzer this visitor runs before `QueryNormalizer` applies `prefer_column_name_to_alias`, so a scalar SQL UDF named `eval` used in an expression context was rewritten too: with a source column `q`, a UDF `CREATE FUNCTION eval AS x -> x + 1`, and `WITH 10 AS q SELECT eval(q) FROM t SETTINGS enable_analyzer = 0, prefer_column_name_to_alias = 1`, the argument became `eval(10)` (result `11`) instead of using the source column (`eval(1)`, result `2`). The substitution is now driven from the table-function position (`ASTTableExpression::table_function`) via `substituteWithLiteralsInEvalTableFunction`, which descends into nested table functions (e.g. `loop(eval(q))`) but stops at subqueries, so only an actual `eval` table-function occurrence is rewritten. A scalar SQL UDF with the same name in an ordinary expression context is left for `QueryNormalizer`, which honours `prefer_column_name_to_alias`. Added `04492_eval_scalar_udf_prefer_column` covering the scalar SQL UDF named `eval` with `prefer_column_name_to_alias` under both analyzers, and asserting that the `eval` table function still substitutes `WITH` aliases. Addresses the AI review finding on `src/Interpreters/ApplyWithSubqueryVisitor.cpp`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review "Request changes" finding on Problem: under the old analyzer, Fix: the substitution is now driven from the table-function position ( Test: added Verification (local Debug build): the six covered queries return The remaining AI-review finding ( |
|
Merged This is a pure The two |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 450/481 (93.56%) · Uncovered code |

Description
eval#101293evalfor constructing a query from a constant expression or anotherSELECTquery.Changelog category (leave one):
Changelog entry:
eval, which evaluates a constant expression or single-rowSELECTquery to a query string and executes the resulting singleSELECTquery. The feature is disabled by default and can be enabled with settingallow_experimental_eval_table_function.Documentation entry for user-facing changes