Enforce ColumnConstPr in ActionsDAG#104890
Conversation
`ActionsDAG::Node::column` was a generic `ColumnPtr` but in practice only ever held a `ColumnConst`. Encode the invariant in the type: - `Node::column` is now `ColumnConstPtr` (= `ColumnConst::Ptr`), with `ColumnConst` only forward-declared in `ActionsDAG.h` so the header does not pull in `ColumnConst.h`. - `addColumn` and `makeAddingColumnActions` take `(ColumnConstPtr, DataTypePtr, std::string)` instead of `ColumnWithTypeAndName`; `addColumn` rejects non-zero-sized constants. - In `addFunctionImpl`, the constant produced by constant folding is rewrapped as size-zero via `ColumnConst::create(getDataColumnPtr(), 0)` before being stored on the node. - `deserializeConstant` now returns `ColumnConst::Ptr` and wraps the lambda-capture `ColumnFunction` value in a `ColumnConst` for consistency with the round-trip in `serializeConstant`. Supporting changes: - In `COW`, `mutable_ptr` / `immutable_ptr` are promoted from `protected` to `public` (their raw-pointer constructors stay private, so only `COW`/`COWHelper` friends can mint one). `COWHelper` gains its own `getPtr()` so `ColumnConst::getPtr()` returns `ColumnConst::Ptr` rather than the base `ColumnPtr`. - `ActionsDAG`'s default ctor, move ctor / assign, destructor, and `detachNodes` are out-of-lined so the `std::list<Node>` destruction point sees a complete `ColumnConst`. - All call sites of `addColumn` and `makeAddingColumnActions` are migrated to the new signature. Most do `assert_cast<const ColumnConst &>(*type->createColumnConst(0, ...)).getPtr()`, or reuse an existing `typeid_cast<const ColumnConst *>` result with `->getPtr()`. - `addPlaceholder` no longer stores a non-const column on the node. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`IDataType::createColumnConst` and `createColumnConstWithDefaultValue` previously returned `ColumnPtr`, hiding the fact that the result is always a `ColumnConst`. Switch the return type to `MutableColumnConstPtr` so the type matches what `ColumnConst::create` actually produces. - `IColumn_fwd.h` forward-declares `ColumnConst` and defines the `ColumnConstPtr` / `MutableColumnConstPtr` aliases. To allow these pointers to be destroyed in TUs that only see the forward declaration, `intrusive_ptr_add_ref/release(const ColumnConst *)` are declared in the fwd header and defined in `ColumnConst.cpp` as forwarders to the `IColumn` versions. - `IDataType.h` mirrors the same forward declaration/alias pair locally. - `DataTypeNullable` updates its override signature accordingly. Most call sites stay the same because `MutableColumnConstPtr` converts implicitly (via move) to either `ColumnPtr` or `ColumnConstPtr`. A handful of sites that were copying the result are switched to `std::move` or to an explicit `ColumnPtr` variable, and a few sites that relied on `auto` deduction now spell the type out so the implicit conversion takes place. `Columns/ColumnConst.h` is added to the TUs whose surface uses `ColumnConst` directly (member access, upcast to `ColumnPtr`, `ColumnWithTypeAndName` construction from the result). These genuinely need the complete type — the new ADL declarations only cover the destruction path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ConstantValue` stored the wrapped constant as `ColumnPtr` even though the column is always a `ColumnConst`. Switch the field and `getColumn()` to `ColumnConstPtr`, and update `ConstantNode::getColumn` to match. `PlannerActionsVisitorImpl::addConstantIfNecessary` now takes `(ColumnConstPtr column, DataTypePtr type, std::string name, ...)` instead of a `ColumnWithTypeAndName`, matching the recent `addColumn` signature. The two call sites pass the constant directly and only materialize a `ColumnWithTypeAndName` for `addInputConstantColumnIfNecessary` when there is more than one stack frame to update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously addColumn threw LOGICAL_ERROR if the input ColumnConst was not size 0. Instead, transparently rewrap the data column with size 0 so callers don't have to remember the convention. The size in a DAG-node constant is meaningless — execution recomputes it from the block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| throw Exception(ErrorCodes::LOGICAL_ERROR, "No Set is passed as the second argument for function '{}'", getName()); | ||
| } | ||
|
|
||
| auto set = future_set->get(); |
There was a problem hiding this comment.
input_rows_count == 0 is now checked only after future_set->get(). That changes non-dry_run behavior for empty blocks: with a not-ready set we now throw LOGICAL_ERROR, while previously this path returned an empty column without touching the set.
Empty chunks can still appear in execution, so this can become a false failure even though there is nothing to evaluate. Can we keep the old fast path for non-dry_run empty input (e.g. return early before future_set->get()) and keep the new empty-set constant logic only for dry_run header evaluation?
`formatConstant` bailed out to `node->result_name` when the column was `empty()`, but since constants in DAG nodes are now stored at size 0 that branch fires for every constant. Read the value from the underlying data column instead. Also restore a default-valued ColumnConst on placeholder nodes so they have a representative column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`FunctionCapture` (the IFunctionBase backing a lambda) inherited the default `isSuitableForConstantFolding() == true`, so the analyzer would happily fold `arrayMap(x -> f(x), [...const array...])` even when `f` was non-deterministic or referenced an IN set whose RHS hadn't been built yet. The folded value (computed with `dry_run=true`) would be cached on the DAG node and reused at execution time, producing wrong results — most visibly for non-deterministic WASM UDFs whose dry-run returns defaults (test `04212_bare_function_name_as_lambda_wasm`). Walk the lambda's inner DAG and propagate the existing constant-folding suitability up: - A FUNCTION node disqualifies the lambda if its `function_base` reports `isSuitableForConstantFolding() == false`. - A COLUMN node disqualifies the lambda if it wraps a `ColumnSet` whose `FutureSet` isn't ready — same check `getFunctionArguments` already performs for direct children of a function node. - ARRAY_JOIN and PLACEHOLDER nodes inside a lambda also disqualify it. This lets `addFunctionImpl`'s `all_const` check fall through cleanly: with the lambda's `node.column` left null, the outer higher-order function isn't const-folded either, and the inner functions run for real at execution time. While here: also stop forwarding the outer `dry_run` flag into the lambda body in `ExecutableFunctionExpression::executeImpl`. With the suitability check above, this can't reach a non-deterministic call through constant folding anymore, but reverting the propagation keeps the lambda boundary opaque (matches pre-PR behaviour) and avoids similar surprises if anyone else ever dry-runs through a lambda. Adjust `evaluateConstantExpression` to apply `cloneResized(1)` after both the new- and old-analyzer branches: DAG nodes now hold size-zero ColumnConsts (per the `addColumn` invariant), so the existing `empty()` check would otherwise reject a perfectly valid constant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix clang-tidy errors in arm_tidy build: - Drop std::move on ColumnPtr passed to ConstantValue's const-ref ctor - Use !empty() instead of size() != 0 in ActionsDAG - Add missing ColumnConst.h include in gtest_stats.cpp so the MutableColumnConstPtr -> ColumnPtr conversion sees the inheritance Also simplify several call sites now that ActionsDAG::Node::column is always a ColumnConst when non-null. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104890&sha=21ced25b2696e6d3de79c2ff0fffebc7b3cedd3a&name_0=PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::addPlaceholder` used to set `node.column` to `createColumnConstWithDefaultValue(0)` so the "Node::column is always a ColumnConst" invariant held for every action type. That value is bogus — the placeholder stands for an unknown correlated value rewritten to an `INPUT` later by `decorrelate`. Treating it as a real constant let `getFunctionArguments` mark functions over the placeholder as "all arguments constant", `addFunctionImpl` then folded them through `getConstantResultForNonConstArguments` and friends, and the placeholder's default value (0 for UInt64, NULL for Nullable, ...) was baked into the DAG and into headers via `ActionsDAG::updateHeader`. The fold flowed downstream into the "Create correlated subquery result alias" step (`ActionsDAG(header, duplicate_const_columns=true)` then shadowed the input with the constant), so correlated scalar subqueries, correlated `arrayMap` lambdas, and tests like `03709_tuple_inside_nullable_basic` returned the default value instead of the per-row value. - `addPlaceholder` no longer sets `node.column`. `Node::column == nullptr` is already valid for INPUT/PLACEHOLDER nodes; only the "if-set-must-be-ColumnConst" rule applies. - `getFunctionArguments` excludes PLACEHOLDER children from the `all_const` check explicitly (kept for safety even with the nullptr column above). - `evaluatePartialResult` synthesizes a non-const empty/default column for PLACEHOLDER nodes at header build time, so header computation still produces an output but no constant value can be propagated. Also fix two other places that fed size-0 ColumnConst columns to `evaluatePartialResult` with `input_rows_count == 1`: - `analyzeConstant` in `evaluateConstantExpression.cpp`: rebuild the ConjunctionMap value as a size-1 ColumnConst. `createSelector` / `createBlockSelector` use the row count, so a size-0 value made `optimize_skip_unused_shards` return an empty shard set (`01758_optimize_skip_unused_shards_once`). - `filterResultForNotMatchedRows` in `Optimizations/Utils.cpp`: build the pre-populated input as a size-1 ColumnConst for the same reason, which is what the JOIN-conversion optimizer feeds into `evaluatePartialResult` (`03583_rewrite_in_to_join`'s `c0 = ANY(SELECT 1) ? 1 : 2` case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`initializeSearchTokens` rejected `column_needles->empty()` to defend
against an unpopulated planning-time column. After `ActionsDAG::addColumn`
normalizes every constant to size 0, that check fires on every regular
const-folded `needles` argument too, so `buildImpl` builds the function
with an empty `search_tokens`. At runtime `executeImpl` short-circuits
to all-zero results when `search_tokens` is empty, so
`hasAllTokens(msg, sparseGrams('click'))` and friends matched nothing.
Restrict the empty-column guard to non-const columns. For a ColumnConst
the value lives in the data column regardless of the wrapper size, so
`(*column_needles)[0]` is safe and returns the actual needles array.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returning NULL for `transform(NULL, ['', ''], ['', ''], <NULL column>)` is valid: with the first argument typed as `Nullable(Nothing)`, `from_type->onlyNull()` is true so `initializeTransformCache` sets `cache->is_empty` and the function returns the cast default (`NULL` as `Nullable(String)`). Drop the obsolete `serverError BAD_ARGUMENTS` marker and capture the actual output in the reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`FunctionIn` disabled `useDefaultImplementationForConstants` so that its own `executeImpl` could see `input_rows_count = 0` during header computation (where the underlying set may not yet be built). After `ActionsDAG::addColumn` started normalizing every ColumnConst argument to size 0, that opt-out turned every constant-folding call into the same `input_rows_count = 0` path: `executeImpl` short-circuited to an empty non-ColumnConst column, `addFunctionImpl` dropped it, and downstream constant folding silently broke. Visible regressions included `WHERE 1 IN (1)` no longer simplifying to `WHERE 1 = 1` in `transformQueryForExternalDatabase` (and the corresponding `TransformQueryForExternalDatabase.InWith*` gtests), plus several stateless tests (`02351_Map_combinator_dist`, `02165_replicated_grouping_sets`, `00290_shard_aggregation_memory_efficient`, `03023_group_by_use_nulls_analyzer_crashes`, `03399_divide_zero_or_null`, `01398_in_tuple_func`). Return `!ignore_set` from `useDefaultImplementationForConstants` so the default wrapper kicks in for the real `in`/`notIn`/`globalIn`/... when all arguments are constants — it clones them to size 1, runs the function with `input_rows_count = 1`, and wraps the result as a ColumnConst with the original `input_rows_count`. The `-IgnoreSet` variants keep the opt-out: they exist purely for type analysis before the set is built and must not fold. Also restore `03340_transform_logical_error_fix` to expect `BAD_ARGUMENTS`. The constant-folding path now reaches `transform`'s `initializeTransformCache` with `input_rows_count = 1` again, so the existing argument validation throws as the test originally asserted. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104890&sha=f18314f1c1286134074308b98c8b5dfa1fe1c684&name_0=PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous approach (enable `useDefaultImplementationForConstants`)
fixed `WHERE 1 IN (1)` constant folding but introduced a regression on
`WHERE x IN (subquery)`: the default constant-folding wrapper runs
`executeImpl` with `dry_run = true` during `FilterTransform`'s header
evaluation, before `CreatingSets` builds the subquery's set. The `!set`
branch returns `ColumnUInt8(input_rows_count, 0)` and the default
wrapper then wraps it as a `ColumnConst(0)` — at which point
`FilterTransform::prepare` sees `always_false` and skips the filter
entirely, swallowing every row of e.g.
SELECT 1 AS number FROM (SELECT 1) WHERE number IN (SELECT 1)
(visible in `00132_sets`, `01015_empty_in_inner_right_join`,
`02376_analyzer_in_function_subquery`, `03708_join_to_in_output_header`,
many other fast tests).
Switch back to `useDefaultImplementationForConstants = false` and handle
the size-0 ColumnConst inputs produced by `ActionsDAG::addColumn`
directly: when the left argument is a ColumnConst we now unwrap to its
size-1 data column before checking `input_rows_count`, so `1 IN (1)`
still folds even when the constant carries `size = 0`. Subquery sets
that aren't yet built still return an empty non-ColumnConst column, so
the DAG keeps the IN expression in place and the filter runs after
`CreatingSets`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With our `ActionsDAG::Node::column` ColumnConst-only invariant, `transform`'s `executeImpl` no longer hits the original `default_non_const->empty()` path that the test was written to assert: both `arguments[1]` and `arguments[3]` arrive as size-0 ColumnConsts, the `size() > size()` size check is false, `cache->is_empty` is set from `from_type->onlyNull()` and the function returns the cast `Nullable(String)` default — NULL — as documented behavior. Returning NULL for an all-NULL `transform` call is consistent with the function's contract (NULL key matches nothing, default is NULL), so drop the `serverError BAD_ARGUMENTS` marker and capture the actual output in the reference. The original regression the test was guarding against (logical-error crash) is fixed by the column-folding pipeline itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ScopeStack::addColumn` already required its `column.column` to be a ColumnConst (it does `assert_cast<const ColumnConst &>(*column.column)` before forwarding to `ActionsDAG::addColumn`), but it accepted a `ColumnWithTypeAndName` so the constness was checked at runtime via an `assert_cast`. Pass `ColumnConstPtr` plus `DataTypePtr` and a name directly, mirroring `ActionsDAG::addColumn` and removing the runtime cast. Update the four call sites — IN-set column, joinGet/dictGet table column, parameterized-view placeholder, and literal — to construct their ColumnConsts up front. The parameterized-view path now uses `createColumnConstWithDefaultValue` (the previous `ColumnWithTypeAndName(type, name)` constructor used `type->createColumn()`, which produced a non-const column and would have tripped the `assert_cast` if ever reached). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::addColumn` normalizes ColumnConst nodes to size 0, so the existing \`value->column->size() != 1\` check always failed once the constant flowed through the DAG. The prefix extraction was skipped, and \`SELECT ... FROM system.zookeeper WHERE path LIKE 'prefix%'\` would silently fall back to traversing the whole tree from "/" when \`allow_unrestricted_reads_from_keeper = 1\`. The neighbouring \`equals\` and \`in\` branches were already updated for size-0 constants. Match them: drop the size check and read the underlying value via \`getDataAt(0)\` (ColumnConst forwards to its data column regardless of the wrapper size). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::deserializeConstant` for `DataTypeFunction` built the
deserialized `ColumnFunction` with `elements_size = 1` even though
every captured column comes back as a size-0 ColumnConst (matching the
size-0 invariant `ActionsDAG::addColumn` now enforces for DAG node
columns).
The mismatch is invisible until a higher-order function (e.g.
`arrayMap`) calls `convertToFullColumnIfConst` on the deserialized
ColumnConst<ColumnFunction>: `ColumnConst::convertToFullColumn`
forwards to `ColumnFunction::replicate(Offsets{0})` (offsets of size 1
with value 0), `ColumnFunction::replicate` walks the captured columns
and calls `ColumnConst::replicate(offsets)` on each — and that throws
`Size of offsets (1) doesn't match size of column (0)` because the
captures are size 0 while `elements_size = 1`.
The crash shows up on the receiving side of distributed queries that
ship a query plan with a captured lambda (test
`02351_Map_combinator_dist` etc): the ExpressionStep constructor calls
`ActionsDAG::updateHeader`, which dry-runs `arrayMap` through
`evaluatePartialResult`.
Build the ColumnFunction with `elements_size = 0` so the wrapper and
its captures agree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build the deserialized ColumnFunction with its original layout: keep \`elements_size = 1\` (required because the result is wrapped in a ColumnConst, whose constructor enforces \`data->size() == 1\`) and \`cloneResized(1)\` every captured column right after recursive \`deserializeConstant\` returns it. The recursive call yields size-0 ColumnConsts to honor the DAG node invariant; left as-is, they tripped \`ColumnFunction::replicate\`'s per-capture \`ColumnConst::replicate\` when \`convertToFullColumnIfConst\` ran on the deserialized lambda during \`ExpressionStep\`'s \`updateHeader\` on the remote side. This is the same layout \`FunctionCaptureOverloadResolver::executeImpl\` produces in its constant-folding branch (\`cloneResized(1)\` captures inside a \`ColumnFunction::create(1, ...)\`), so the deserialized column behaves identically to one built locally. Reproducer (e.g. \`02351_Map_combinator_dist\`): run with \`serialize_query_plan = 1\` and \`prefer_localhost_replica = 0\`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`ActionsDAG::addColumn\` normalizes ColumnConst arguments to size 0 so the DAG-side invariant holds, but not every function is implemented correctly for size-0 constant inputs. Concrete offender: \`moduloOrNull\`/\`divideOrNull\`/etc. — \`FunctionBinaryArithmetic\` opts out of \`useDefaultImplementationForConstants\`, so when \`addFunctionImpl\` runs the function over the DAG's size-0 ColumnConsts the per-row null-map loop iterates zero times, the divide-by-zero null bit is silently dropped, and \`SELECT moduloOrNull(1, 0)\` const-folds to \`0\` instead of \`\\N\` (visible with the old analyzer, where the entire constant-folding pass goes through ActionsVisitor). Resize ColumnConst arguments to size 1 inside \`getFunctionArguments\` so each function that goes through \`addFunctionImpl\` sees a properly-shaped constant input, mirroring what \`IExecutableFunction::defaultImplementationForConstantArguments\` does internally for opted-in functions. While here: - Drop the explicit \`PLACEHOLDER\` check; \`addPlaceholder\` no longer assigns \`node.column\`, so the new \`typeid_cast<ColumnConst>\` filter catches placeholders via \`column == nullptr\`. - Reuse the cast pointer instead of an extra \`isColumnConst\` virtual call before checking for ColumnSet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- src/Storages/MergeTree/Streaming/CursorUtils.cpp: \`ActionsDAG::addColumn\` signature changed to take \`(ColumnConstPtr, DataTypePtr, std::string)\` in this branch; pass the column / type / name directly instead of the old \`ColumnWithTypeAndName\` constructor form. - src/Functions/getFuzzerData.cpp: \`createColumnConst\` returns \`MutableColumnConstPtr\` now. \`ColumnPtr\` (immutable_ptr<IColumn>) doesn't see the \`ColumnConst : IColumn\` inheritance through the forward declaration in \`IColumn_fwd.h\`, so the implicit conversion fails. Include the full \`<Columns/ColumnConst.h>\` definition (same fix already applied to \`gtest_stats.cpp\`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DAG node columns are now \`ColumnConstPtr\` (and always ColumnConst when non-null), and \`CompileDAG::Node::column\` was changed to \`ColumnConstPtr\` in the previous "Simplify" commit. Drop the \`isColumnConst(*node.column)\` / \`typeid_cast<const ColumnConst *>(node.column.get())\` helpers that are now redundant: - \`ActionsDAG::makeConvertingActions\`: the source \`dst_node->column\` is already a ColumnConstPtr; check for it directly and read the field via \`getField()\`. - \`ExpressionJIT::isCompilableConstant\`: collapse to a \`!= nullptr\` check on \`node.column\`. - \`CompileDAG::dump\`: read \`node.column->getDataColumn()\` directly. - \`evaluateConstantExpression\`: drop \`isColumnConst\` guard inside the \`ActionType::COLUMN\` branch — by invariant it always holds. - \`resolveFunction\`: bind the \`typeid_cast\` result once and reuse it instead of casting twice (still need to detect the const case because \`column\` is the result of \`executable_function->execute\`, not a DAG node). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @@ -38,7 +38,7 @@ ConstantNode::ConstantNode(ConstantValue constant_value_) | |||
| {} | |||
|
|
|||
| ConstantNode::ConstantNode(ColumnPtr constant_column_, DataTypePtr value_data_type_) | |||
There was a problem hiding this comment.
I guess the price we pay here by taking a copy here (const ref would be enough) is not worth the refactor of changing the signature. If by any mean it causes any issues, we can do it.
There was a problem hiding this comment.
Probably I will do. Nice catch!
| COW(const COW&) = default; | ||
|
|
||
| protected: | ||
| public: |
There was a problem hiding this comment.
This seems like a code smell (exposing to much of the internals) for the first look, but I have to check out why is this necessary.
There was a problem hiding this comment.
I think I realized that we exposed these because of {Mutable,}ColumnConstPtr. Cannot we forward declare ColumnConst and expose only the ColumnConst pointers?
There was a problem hiding this comment.
Unfortunately no. The typed ptr is a nested class of IColumn, and in C++, it's not possible to forward-declare a nested class. The best I (and Claude) could do is include COW, forward-declare the class, and forward-declare the internal pointer.
| { col, | ||
| std::make_shared<DataTypeString>(), | ||
| "" |
There was a problem hiding this comment.
| { col, | |
| std::make_shared<DataTypeString>(), | |
| "" | |
| { | |
| col, | |
| std::make_shared<DataTypeString>(), | |
| "" |
|
|
||
| /// Add dimension and reference vector as last two arguments | ||
| auto dimension_column = DataTypeUInt64().createColumnConst(1, qbit_dimension); | ||
| ColumnPtr dimension_column = DataTypeUInt64().createColumnConst(1, qbit_dimension); |
There was a problem hiding this comment.
I am curious why these explicit conversions are necessary. I will check it out locally.
There was a problem hiding this comment.
Probably it is not, just a wrong attempt to fix a build.
|
|
||
| ActionsDAG::ActionsDAG() = default; | ||
| ActionsDAG::ActionsDAG(ActionsDAG &&) noexcept = default; | ||
| ActionsDAG & ActionsDAG::operator=(ActionsDAG &&) noexcept = default; | ||
| ActionsDAG::~ActionsDAG() = default; | ||
|
|
||
| ActionsDAG::Nodes ActionsDAG::detachNodes(ActionsDAG && dag) { return std::move(dag.nodes); } |
There was a problem hiding this comment.
Nit pick: can we move these after the empty name space (right before ActionsDAG::isDeterministic)?
| bool is_deterministic_constant = true; | ||
| /// For COLUMN node and propagated constants. | ||
| ColumnPtr column; | ||
| /// For COLUMN node and propagated constants. Always ColumnConst. |
There was a problem hiding this comment.
Can we also mention the invariant of column()->size() == 0?
| auto is_const = [](const ActionsDAG::Node & n) | ||
| { | ||
| return n.type == ActionsDAG::ActionType::COLUMN && n.column && isColumnConst(*n.column); | ||
| return n.type == ActionsDAG::ActionType::COLUMN && n.column; |
There was a problem hiding this comment.
This should be enough, right? All ActionType::COLUMN should contain a column, right?
| return n.type == ActionsDAG::ActionType::COLUMN && n.column; | |
| return n.type == ActionsDAG::ActionType::COLUMN; |
| auto name = "const_" + DB::toString(literal_counter++); | ||
|
|
||
| const auto & node = dag->addColumn(std::move(column)); | ||
| const auto & node = dag->addColumn(std::move(col), type, std::move(name)); |
There was a problem hiding this comment.
| const auto & node = dag->addColumn(std::move(col), type, std::move(name)); | |
| const auto & node = dag->addColumn(std::move(col), std::move(type), std::move(name)); |
| SELECT 1 | ||
| WHERE 0 | ||
| SELECT 1 | ||
| WHERE 1 IN (0, 1, 2) |
There was a problem hiding this comment.
Isn't this a problem? Or in the old analyzer we already optimize this away?
There was a problem hiding this comment.
I think it just removes WHERE 1. I don't exactly know why it could not be removed before, but it looks legit.
| if (set->getTotalRowCount() == 0) | ||
| return ColumnConst::create(ColumnUInt8::create(1, negative), input_rows_count); | ||
|
|
||
| if (input_rows_count == 0) |
There was a problem hiding this comment.
I don't really understand this change. Do you remember why it was necessary?
There was a problem hiding this comment.
This exact code prevented constant-folding.
Ideally, I would prefer that fiction with input_rows_count==0, and all const args return a constant column with the correct value.
For now, I dropped this idea temporarily, since not all the functions actually do it correctly.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 772/890 (86.74%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 26 line(s) · Uncovered code |
|
📊 Cloud Performance Report 🔴 AI verdict: TPC-H Q4 shows a clear 32% slowdown (3258ms vs 2460ms), confirmed by both tests and large enough to stand on its own. This PR reworks how correlated subqueries are planned — it stops materializing a synthetic constant for PLACEHOLDER (correlated) columns and changes constant folding in ActionsDAG. Q4 is the order-priority query built around a correlated EXISTS subquery, which is exactly the decorrelation path being changed, so a real planning-level regression here is plausible and worth a look. The other benchmarks were not flagged. clickbench🟢 No significant changes tpch_adapted_1_official🔴 1 regressed Flagged queries (1 of 22)q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. Debug info
|

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Main changes