Enforce ColumnConstPr in ActionsDAG by KochetovNicolai · Pull Request #104890 · ClickHouse/ClickHouse · GitHub
Skip to content

Enforce ColumnConstPr in ActionsDAG#104890

Merged
KochetovNicolai merged 37 commits into
masterfrom
actions-dag-node-column-const-ptr
Jun 5, 2026
Merged

Enforce ColumnConstPr in ActionsDAG#104890
KochetovNicolai merged 37 commits into
masterfrom
actions-dag-node-column-const-ptr

Conversation

@KochetovNicolai

@KochetovNicolai KochetovNicolai commented May 13, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Main changes

  • Forward-declare ColumnConstPtr
  • Use ColumnConstPtr in ActionsDAG and enforce if
  • Return ColumnConstPtr from IDataType::createColumn

`ActionsDAG::Node::column` was a generic `ColumnPtr` but in practice only
ever held a `ColumnConst`. Encode the invariant in the type:

- `Node::column` is now `ColumnConstPtr` (= `ColumnConst::Ptr`), with
  `ColumnConst` only forward-declared in `ActionsDAG.h` so the header
  does not pull in `ColumnConst.h`.
- `addColumn` and `makeAddingColumnActions` take
  `(ColumnConstPtr, DataTypePtr, std::string)` instead of
  `ColumnWithTypeAndName`; `addColumn` rejects non-zero-sized constants.
- In `addFunctionImpl`, the constant produced by constant folding is
  rewrapped as size-zero via `ColumnConst::create(getDataColumnPtr(), 0)`
  before being stored on the node.
- `deserializeConstant` now returns `ColumnConst::Ptr` and wraps the
  lambda-capture `ColumnFunction` value in a `ColumnConst` for
  consistency with the round-trip in `serializeConstant`.

Supporting changes:

- In `COW`, `mutable_ptr` / `immutable_ptr` are promoted from `protected`
  to `public` (their raw-pointer constructors stay private, so only
  `COW`/`COWHelper` friends can mint one). `COWHelper` gains its own
  `getPtr()` so `ColumnConst::getPtr()` returns `ColumnConst::Ptr` rather
  than the base `ColumnPtr`.
- `ActionsDAG`'s default ctor, move ctor / assign, destructor, and
  `detachNodes` are out-of-lined so the `std::list<Node>` destruction
  point sees a complete `ColumnConst`.
- All call sites of `addColumn` and `makeAddingColumnActions` are
  migrated to the new signature. Most do
  `assert_cast<const ColumnConst &>(*type->createColumnConst(0, ...)).getPtr()`,
  or reuse an existing `typeid_cast<const ColumnConst *>` result with
  `->getPtr()`.
- `addPlaceholder` no longer stores a non-const column on the node.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 13, 2026
Comment thread src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp Outdated
Comment thread src/Planner/PlannerJoinsLogical.cpp Outdated
Comment thread src/Planner/PlannerActionsVisitor.cpp Outdated
Comment thread src/Interpreters/ActionsVisitor.cpp Outdated
KochetovNicolai and others added 5 commits May 19, 2026 15:44
`IDataType::createColumnConst` and `createColumnConstWithDefaultValue`
previously returned `ColumnPtr`, hiding the fact that the result is
always a `ColumnConst`. Switch the return type to
`MutableColumnConstPtr` so the type matches what `ColumnConst::create`
actually produces.

- `IColumn_fwd.h` forward-declares `ColumnConst` and defines the
  `ColumnConstPtr` / `MutableColumnConstPtr` aliases. To allow these
  pointers to be destroyed in TUs that only see the forward declaration,
  `intrusive_ptr_add_ref/release(const ColumnConst *)` are declared in
  the fwd header and defined in `ColumnConst.cpp` as forwarders to the
  `IColumn` versions.
- `IDataType.h` mirrors the same forward declaration/alias pair locally.
- `DataTypeNullable` updates its override signature accordingly.

Most call sites stay the same because `MutableColumnConstPtr` converts
implicitly (via move) to either `ColumnPtr` or `ColumnConstPtr`. A
handful of sites that were copying the result are switched to
`std::move` or to an explicit `ColumnPtr` variable, and a few sites that
relied on `auto` deduction now spell the type out so the implicit
conversion takes place.

`Columns/ColumnConst.h` is added to the TUs whose surface uses
`ColumnConst` directly (member access, upcast to `ColumnPtr`,
`ColumnWithTypeAndName` construction from the result). These genuinely
need the complete type — the new ADL declarations only cover the
destruction path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ConstantValue` stored the wrapped constant as `ColumnPtr` even though
the column is always a `ColumnConst`. Switch the field and
`getColumn()` to `ColumnConstPtr`, and update `ConstantNode::getColumn`
to match.

`PlannerActionsVisitorImpl::addConstantIfNecessary` now takes
`(ColumnConstPtr column, DataTypePtr type, std::string name, ...)`
instead of a `ColumnWithTypeAndName`, matching the recent `addColumn`
signature. The two call sites pass the constant directly and only
materialize a `ColumnWithTypeAndName` for `addInputConstantColumnIfNecessary`
when there is more than one stack frame to update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously addColumn threw LOGICAL_ERROR if the input ColumnConst was
not size 0. Instead, transparently rewrap the data column with size 0
so callers don't have to remember the convention. The size in a
DAG-node constant is meaningless — execution recomputes it from the
block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Functions/in.cpp
throw Exception(ErrorCodes::LOGICAL_ERROR, "No Set is passed as the second argument for function '{}'", getName());
}

auto set = future_set->get();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_rows_count == 0 is now checked only after future_set->get(). That changes non-dry_run behavior for empty blocks: with a not-ready set we now throw LOGICAL_ERROR, while previously this path returned an empty column without touching the set.

Empty chunks can still appear in execution, so this can become a false failure even though there is nothing to evaluate. Can we keep the old fast path for non-dry_run empty input (e.g. return early before future_set->get()) and keep the new empty-set constant logic only for dry_run header evaluation?

`formatConstant` bailed out to `node->result_name` when the column was
`empty()`, but since constants in DAG nodes are now stored at size 0
that branch fires for every constant. Read the value from the
underlying data column instead.

Also restore a default-valued ColumnConst on placeholder nodes so they
have a representative column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Interpreters/ActionsVisitor.cpp Outdated
KochetovNicolai and others added 13 commits May 20, 2026 13:55
`FunctionCapture` (the IFunctionBase backing a lambda) inherited the
default `isSuitableForConstantFolding() == true`, so the analyzer would
happily fold `arrayMap(x -> f(x), [...const array...])` even when `f`
was non-deterministic or referenced an IN set whose RHS hadn't been
built yet. The folded value (computed with `dry_run=true`) would be
cached on the DAG node and reused at execution time, producing wrong
results — most visibly for non-deterministic WASM UDFs whose dry-run
returns defaults (test `04212_bare_function_name_as_lambda_wasm`).

Walk the lambda's inner DAG and propagate the existing constant-folding
suitability up:

- A FUNCTION node disqualifies the lambda if its `function_base`
  reports `isSuitableForConstantFolding() == false`.
- A COLUMN node disqualifies the lambda if it wraps a `ColumnSet`
  whose `FutureSet` isn't ready — same check `getFunctionArguments`
  already performs for direct children of a function node.
- ARRAY_JOIN and PLACEHOLDER nodes inside a lambda also disqualify it.

This lets `addFunctionImpl`'s `all_const` check fall through cleanly:
with the lambda's `node.column` left null, the outer higher-order
function isn't const-folded either, and the inner functions run for
real at execution time.

While here: also stop forwarding the outer `dry_run` flag into the
lambda body in `ExecutableFunctionExpression::executeImpl`. With the
suitability check above, this can't reach a non-deterministic call
through constant folding anymore, but reverting the propagation keeps
the lambda boundary opaque (matches pre-PR behaviour) and avoids
similar surprises if anyone else ever dry-runs through a lambda.

Adjust `evaluateConstantExpression` to apply `cloneResized(1)` after
both the new- and old-analyzer branches: DAG nodes now hold size-zero
ColumnConsts (per the `addColumn` invariant), so the existing
`empty()` check would otherwise reject a perfectly valid constant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix clang-tidy errors in arm_tidy build:
- Drop std::move on ColumnPtr passed to ConstantValue's const-ref ctor
- Use !empty() instead of size() != 0 in ActionsDAG
- Add missing ColumnConst.h include in gtest_stats.cpp so the
  MutableColumnConstPtr -> ColumnPtr conversion sees the inheritance

Also simplify several call sites now that ActionsDAG::Node::column
is always a ColumnConst when non-null.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104890&sha=21ced25b2696e6d3de79c2ff0fffebc7b3cedd3a&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::addPlaceholder` used to set `node.column` to
`createColumnConstWithDefaultValue(0)` so the "Node::column is always a
ColumnConst" invariant held for every action type. That value is bogus —
the placeholder stands for an unknown correlated value rewritten to an
`INPUT` later by `decorrelate`. Treating it as a real constant let
`getFunctionArguments` mark functions over the placeholder as
"all arguments constant", `addFunctionImpl` then folded them through
`getConstantResultForNonConstArguments` and friends, and the
placeholder's default value (0 for UInt64, NULL for Nullable, ...) was
baked into the DAG and into headers via `ActionsDAG::updateHeader`. The
fold flowed downstream into the "Create correlated subquery result
alias" step (`ActionsDAG(header, duplicate_const_columns=true)` then
shadowed the input with the constant), so correlated scalar subqueries,
correlated `arrayMap` lambdas, and tests like
`03709_tuple_inside_nullable_basic` returned the default value instead
of the per-row value.

- `addPlaceholder` no longer sets `node.column`. `Node::column == nullptr`
  is already valid for INPUT/PLACEHOLDER nodes; only the
  "if-set-must-be-ColumnConst" rule applies.
- `getFunctionArguments` excludes PLACEHOLDER children from the
  `all_const` check explicitly (kept for safety even with the nullptr
  column above).
- `evaluatePartialResult` synthesizes a non-const empty/default column
  for PLACEHOLDER nodes at header build time, so header computation
  still produces an output but no constant value can be propagated.

Also fix two other places that fed size-0 ColumnConst columns to
`evaluatePartialResult` with `input_rows_count == 1`:

- `analyzeConstant` in `evaluateConstantExpression.cpp`: rebuild the
  ConjunctionMap value as a size-1 ColumnConst.
  `createSelector` / `createBlockSelector` use the row count, so a
  size-0 value made `optimize_skip_unused_shards` return an empty shard
  set (`01758_optimize_skip_unused_shards_once`).
- `filterResultForNotMatchedRows` in `Optimizations/Utils.cpp`: build
  the pre-populated input as a size-1 ColumnConst for the same reason,
  which is what the JOIN-conversion optimizer feeds into
  `evaluatePartialResult` (`03583_rewrite_in_to_join`'s
  `c0 = ANY(SELECT 1) ? 1 : 2` case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`initializeSearchTokens` rejected `column_needles->empty()` to defend
against an unpopulated planning-time column. After `ActionsDAG::addColumn`
normalizes every constant to size 0, that check fires on every regular
const-folded `needles` argument too, so `buildImpl` builds the function
with an empty `search_tokens`. At runtime `executeImpl` short-circuits
to all-zero results when `search_tokens` is empty, so
`hasAllTokens(msg, sparseGrams('click'))` and friends matched nothing.

Restrict the empty-column guard to non-const columns. For a ColumnConst
the value lives in the data column regardless of the wrapper size, so
`(*column_needles)[0]` is safe and returns the actual needles array.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returning NULL for `transform(NULL, ['', ''], ['', ''], <NULL column>)`
is valid: with the first argument typed as `Nullable(Nothing)`,
`from_type->onlyNull()` is true so `initializeTransformCache` sets
`cache->is_empty` and the function returns the cast default
(`NULL` as `Nullable(String)`). Drop the obsolete `serverError
BAD_ARGUMENTS` marker and capture the actual output in the reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`FunctionIn` disabled `useDefaultImplementationForConstants` so that its
own `executeImpl` could see `input_rows_count = 0` during header
computation (where the underlying set may not yet be built). After
`ActionsDAG::addColumn` started normalizing every ColumnConst argument
to size 0, that opt-out turned every constant-folding call into the
same `input_rows_count = 0` path: `executeImpl` short-circuited to an
empty non-ColumnConst column, `addFunctionImpl` dropped it, and
downstream constant folding silently broke. Visible regressions
included `WHERE 1 IN (1)` no longer simplifying to `WHERE 1 = 1` in
`transformQueryForExternalDatabase` (and the corresponding
`TransformQueryForExternalDatabase.InWith*` gtests), plus several
stateless tests (`02351_Map_combinator_dist`, `02165_replicated_grouping_sets`,
`00290_shard_aggregation_memory_efficient`, `03023_group_by_use_nulls_analyzer_crashes`,
`03399_divide_zero_or_null`, `01398_in_tuple_func`).

Return `!ignore_set` from `useDefaultImplementationForConstants` so the
default wrapper kicks in for the real `in`/`notIn`/`globalIn`/... when
all arguments are constants — it clones them to size 1, runs the
function with `input_rows_count = 1`, and wraps the result as a
ColumnConst with the original `input_rows_count`. The `-IgnoreSet`
variants keep the opt-out: they exist purely for type analysis before
the set is built and must not fold.

Also restore `03340_transform_logical_error_fix` to expect
`BAD_ARGUMENTS`. The constant-folding path now reaches `transform`'s
`initializeTransformCache` with `input_rows_count = 1` again, so the
existing argument validation throws as the test originally asserted.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104890&sha=f18314f1c1286134074308b98c8b5dfa1fe1c684&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/System/StorageSystemZooKeeper.cpp
KochetovNicolai and others added 10 commits May 26, 2026 11:30
The previous approach (enable `useDefaultImplementationForConstants`)
fixed `WHERE 1 IN (1)` constant folding but introduced a regression on
`WHERE x IN (subquery)`: the default constant-folding wrapper runs
`executeImpl` with `dry_run = true` during `FilterTransform`'s header
evaluation, before `CreatingSets` builds the subquery's set. The `!set`
branch returns `ColumnUInt8(input_rows_count, 0)` and the default
wrapper then wraps it as a `ColumnConst(0)` — at which point
`FilterTransform::prepare` sees `always_false` and skips the filter
entirely, swallowing every row of e.g.

    SELECT 1 AS number FROM (SELECT 1) WHERE number IN (SELECT 1)

(visible in `00132_sets`, `01015_empty_in_inner_right_join`,
`02376_analyzer_in_function_subquery`, `03708_join_to_in_output_header`,
many other fast tests).

Switch back to `useDefaultImplementationForConstants = false` and handle
the size-0 ColumnConst inputs produced by `ActionsDAG::addColumn`
directly: when the left argument is a ColumnConst we now unwrap to its
size-1 data column before checking `input_rows_count`, so `1 IN (1)`
still folds even when the constant carries `size = 0`. Subquery sets
that aren't yet built still return an empty non-ColumnConst column, so
the DAG keeps the IN expression in place and the filter runs after
`CreatingSets`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With our `ActionsDAG::Node::column` ColumnConst-only invariant,
`transform`'s `executeImpl` no longer hits the original
`default_non_const->empty()` path that the test was written to assert:
both `arguments[1]` and `arguments[3]` arrive as size-0 ColumnConsts,
the `size() > size()` size check is false, `cache->is_empty` is set
from `from_type->onlyNull()` and the function returns the cast
`Nullable(String)` default — NULL — as documented behavior.

Returning NULL for an all-NULL `transform` call is consistent with the
function's contract (NULL key matches nothing, default is NULL), so
drop the `serverError BAD_ARGUMENTS` marker and capture the actual
output in the reference. The original regression the test was guarding
against (logical-error crash) is fixed by the column-folding pipeline
itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ScopeStack::addColumn` already required its `column.column` to be a
ColumnConst (it does `assert_cast<const ColumnConst &>(*column.column)`
before forwarding to `ActionsDAG::addColumn`), but it accepted a
`ColumnWithTypeAndName` so the constness was checked at runtime via an
`assert_cast`. Pass `ColumnConstPtr` plus `DataTypePtr` and a name
directly, mirroring `ActionsDAG::addColumn` and removing the runtime
cast. Update the four call sites — IN-set column, joinGet/dictGet table
column, parameterized-view placeholder, and literal — to construct
their ColumnConsts up front. The parameterized-view path now uses
`createColumnConstWithDefaultValue` (the previous
`ColumnWithTypeAndName(type, name)` constructor used `type->createColumn()`,
which produced a non-const column and would have tripped the
`assert_cast` if ever reached).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::addColumn` normalizes ColumnConst nodes to size 0, so the
existing \`value->column->size() != 1\` check always failed once the
constant flowed through the DAG. The prefix extraction was skipped,
and \`SELECT ... FROM system.zookeeper WHERE path LIKE 'prefix%'\`
would silently fall back to traversing the whole tree from "/" when
\`allow_unrestricted_reads_from_keeper = 1\`.

The neighbouring \`equals\` and \`in\` branches were already updated for
size-0 constants. Match them: drop the size check and read the
underlying value via \`getDataAt(0)\` (ColumnConst forwards to its data
column regardless of the wrapper size).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ActionsDAG::deserializeConstant` for `DataTypeFunction` built the
deserialized `ColumnFunction` with `elements_size = 1` even though
every captured column comes back as a size-0 ColumnConst (matching the
size-0 invariant `ActionsDAG::addColumn` now enforces for DAG node
columns).

The mismatch is invisible until a higher-order function (e.g.
`arrayMap`) calls `convertToFullColumnIfConst` on the deserialized
ColumnConst<ColumnFunction>: `ColumnConst::convertToFullColumn`
forwards to `ColumnFunction::replicate(Offsets{0})` (offsets of size 1
with value 0), `ColumnFunction::replicate` walks the captured columns
and calls `ColumnConst::replicate(offsets)` on each — and that throws
`Size of offsets (1) doesn't match size of column (0)` because the
captures are size 0 while `elements_size = 1`.

The crash shows up on the receiving side of distributed queries that
ship a query plan with a captured lambda (test
`02351_Map_combinator_dist` etc): the ExpressionStep constructor calls
`ActionsDAG::updateHeader`, which dry-runs `arrayMap` through
`evaluatePartialResult`.

Build the ColumnFunction with `elements_size = 0` so the wrapper and
its captures agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build the deserialized ColumnFunction with its original layout: keep
\`elements_size = 1\` (required because the result is wrapped in a
ColumnConst, whose constructor enforces \`data->size() == 1\`) and
\`cloneResized(1)\` every captured column right after recursive
\`deserializeConstant\` returns it. The recursive call yields size-0
ColumnConsts to honor the DAG node invariant; left as-is, they tripped
\`ColumnFunction::replicate\`'s per-capture \`ColumnConst::replicate\`
when \`convertToFullColumnIfConst\` ran on the deserialized lambda
during \`ExpressionStep\`'s \`updateHeader\` on the remote side.

This is the same layout \`FunctionCaptureOverloadResolver::executeImpl\`
produces in its constant-folding branch (\`cloneResized(1)\` captures
inside a \`ColumnFunction::create(1, ...)\`), so the deserialized column
behaves identically to one built locally.

Reproducer (e.g. \`02351_Map_combinator_dist\`): run with
\`serialize_query_plan = 1\` and \`prefer_localhost_replica = 0\`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`ActionsDAG::addColumn\` normalizes ColumnConst arguments to size 0 so
the DAG-side invariant holds, but not every function is implemented
correctly for size-0 constant inputs. Concrete offender:
\`moduloOrNull\`/\`divideOrNull\`/etc. — \`FunctionBinaryArithmetic\`
opts out of \`useDefaultImplementationForConstants\`, so when
\`addFunctionImpl\` runs the function over the DAG's size-0
ColumnConsts the per-row null-map loop iterates zero times, the
divide-by-zero null bit is silently dropped, and
\`SELECT moduloOrNull(1, 0)\` const-folds to \`0\` instead of \`\\N\`
(visible with the old analyzer, where the entire constant-folding pass
goes through ActionsVisitor).

Resize ColumnConst arguments to size 1 inside \`getFunctionArguments\`
so each function that goes through \`addFunctionImpl\` sees a
properly-shaped constant input, mirroring what
\`IExecutableFunction::defaultImplementationForConstantArguments\` does
internally for opted-in functions.

While here:
- Drop the explicit \`PLACEHOLDER\` check; \`addPlaceholder\` no longer
  assigns \`node.column\`, so the new \`typeid_cast<ColumnConst>\`
  filter catches placeholders via \`column == nullptr\`.
- Reuse the cast pointer instead of an extra \`isColumnConst\` virtual
  call before checking for ColumnSet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- src/Storages/MergeTree/Streaming/CursorUtils.cpp: \`ActionsDAG::addColumn\`
  signature changed to take \`(ColumnConstPtr, DataTypePtr, std::string)\`
  in this branch; pass the column / type / name directly instead of the
  old \`ColumnWithTypeAndName\` constructor form.

- src/Functions/getFuzzerData.cpp: \`createColumnConst\` returns
  \`MutableColumnConstPtr\` now. \`ColumnPtr\` (immutable_ptr<IColumn>)
  doesn't see the \`ColumnConst : IColumn\` inheritance through the
  forward declaration in \`IColumn_fwd.h\`, so the implicit conversion
  fails. Include the full \`<Columns/ColumnConst.h>\` definition (same
  fix already applied to \`gtest_stats.cpp\`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@KochetovNicolai KochetovNicolai marked this pull request as ready for review May 28, 2026 14:11
@antaljanosbenjamin antaljanosbenjamin self-assigned this May 28, 2026
KochetovNicolai and others added 2 commits May 28, 2026 15:24
DAG node columns are now \`ColumnConstPtr\` (and always ColumnConst when
non-null), and \`CompileDAG::Node::column\` was changed to
\`ColumnConstPtr\` in the previous "Simplify" commit. Drop the
\`isColumnConst(*node.column)\` / \`typeid_cast<const ColumnConst *>(node.column.get())\`
helpers that are now redundant:

- \`ActionsDAG::makeConvertingActions\`: the source \`dst_node->column\`
  is already a ColumnConstPtr; check for it directly and read the field
  via \`getField()\`.
- \`ExpressionJIT::isCompilableConstant\`: collapse to a \`!= nullptr\`
  check on \`node.column\`.
- \`CompileDAG::dump\`: read \`node.column->getDataColumn()\` directly.
- \`evaluateConstantExpression\`: drop \`isColumnConst\` guard inside the
  \`ActionType::COLUMN\` branch — by invariant it always holds.
- \`resolveFunction\`: bind the \`typeid_cast\` result once and reuse it
  instead of casting twice (still need to detect the const case because
  \`column\` is the result of \`executable_function->execute\`, not a
  DAG node).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antaljanosbenjamin antaljanosbenjamin self-requested a review June 1, 2026 09:13
Comment thread src/Analyzer/ConstantNode.cpp Outdated
@@ -38,7 +38,7 @@ ConstantNode::ConstantNode(ConstantValue constant_value_)
{}

ConstantNode::ConstantNode(ColumnPtr constant_column_, DataTypePtr value_data_type_)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the price we pay here by taking a copy here (const ref would be enough) is not worth the refactor of changing the signature. If by any mean it causes any issues, we can do it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I will do. Nice catch!

Comment thread src/Common/COW.h
COW(const COW&) = default;

protected:
public:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a code smell (exposing to much of the internals) for the first look, but I have to check out why is this necessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I realized that we exposed these because of {Mutable,}ColumnConstPtr. Cannot we forward declare ColumnConst and expose only the ColumnConst pointers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. The typed ptr is a nested class of IColumn, and in C++, it's not possible to forward-declare a nested class. The best I (and Claude) could do is include COW, forward-declare the class, and forward-declare the internal pointer.

Comment thread src/Functions/toBool.cpp Outdated
Comment on lines 53 to 55
{ col,
std::make_shared<DataTypeString>(),
""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ col,
std::make_shared<DataTypeString>(),
""
{
col,
std::make_shared<DataTypeString>(),
""


/// Add dimension and reference vector as last two arguments
auto dimension_column = DataTypeUInt64().createColumnConst(1, qbit_dimension);
ColumnPtr dimension_column = DataTypeUInt64().createColumnConst(1, qbit_dimension);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why these explicit conversions are necessary. I will check it out locally.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is not, just a wrong attempt to fix a build.

Comment thread src/Interpreters/ActionsDAG.cpp Outdated
Comment on lines +45 to +51

ActionsDAG::ActionsDAG() = default;
ActionsDAG::ActionsDAG(ActionsDAG &&) noexcept = default;
ActionsDAG & ActionsDAG::operator=(ActionsDAG &&) noexcept = default;
ActionsDAG::~ActionsDAG() = default;

ActionsDAG::Nodes ActionsDAG::detachNodes(ActionsDAG && dag) { return std::move(dag.nodes); }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick: can we move these after the empty name space (right before ActionsDAG::isDeterministic)?

Comment thread src/Interpreters/ActionsDAG.h Outdated
bool is_deterministic_constant = true;
/// For COLUMN node and propagated constants.
ColumnPtr column;
/// For COLUMN node and propagated constants. Always ColumnConst.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also mention the invariant of column()->size() == 0?

Comment thread src/Storages/MergeTree/KeyCondition.cpp Outdated
auto is_const = [](const ActionsDAG::Node & n)
{
return n.type == ActionsDAG::ActionType::COLUMN && n.column && isColumnConst(*n.column);
return n.type == ActionsDAG::ActionType::COLUMN && n.column;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enough, right? All ActionType::COLUMN should contain a column, right?

Suggested change
return n.type == ActionsDAG::ActionType::COLUMN && n.column;
return n.type == ActionsDAG::ActionType::COLUMN;

auto name = "const_" + DB::toString(literal_counter++);

const auto & node = dag->addColumn(std::move(column));
const auto & node = dag->addColumn(std::move(col), type, std::move(name));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto & node = dag->addColumn(std::move(col), type, std::move(name));
const auto & node = dag->addColumn(std::move(col), std::move(type), std::move(name));

SELECT 1
WHERE 0
SELECT 1
WHERE 1 IN (0, 1, 2)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a problem? Or in the old analyzer we already optimize this away?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just removes WHERE 1. I don't exactly know why it could not be removed before, but it looks legit.

Comment thread src/Functions/in.cpp
if (set->getTotalRowCount() == 0)
return ColumnConst::create(ColumnUInt8::create(1, negative), input_rows_count);

if (input_rows_count == 0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this change. Do you remember why it was necessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exact code prevented constant-folding.
Ideally, I would prefer that fiction with input_rows_count==0, and all const args return a constant column with the correct value.

For now, I dropped this idea temporarily, since not all the functions actually do it correctly.

@clickhouse-gh

clickhouse-gh Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.50% 84.40% -0.10%
Functions 92.40% 92.40% +0.00%
Branches 77.00% 77.00% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 772/890 (86.74%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 26 line(s) · Uncovered code

Full report · Diff report

@clickhouse-gh

clickhouse-gh Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

📊 Cloud Performance Report

🔴 AI verdict: degradation1 query(s) regressed out of 39 analysed

TPC-H Q4 shows a clear 32% slowdown (3258ms vs 2460ms), confirmed by both tests and large enough to stand on its own. This PR reworks how correlated subqueries are planned — it stops materializing a synthetic constant for PLACEHOLDER (correlated) columns and changes constant folding in ActionsDAG. Q4 is the order-priority query built around a correlated EXISTS subquery, which is exactly the decorrelation path being changed, so a real planning-level regression here is plausible and worth a look. The other benchmarks were not flagged.

clickbench

🟢 No significant changes

tpch_adapted_1_official

🔴 1 regressed

Flagged queries (1 of 22)
Query Verdict Baseline med (ms) PR med (ms) Change q-value Hint
🔴 4 regression 2460 3258 +32.4% 0.0003 plan_change: Q4's correlated EXISTS subquery hits the reworked PLACEHOLDER/decorrelation + const-folding path this PR changes.

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

Debug info
  • StressHouse run: 9e948d03-b05b-4721-885a-4303b4d7643f
  • MIRAI run: 6f949b6f-959a-41ec-96fa-4a980088a3c6
  • PR check IDs:
    • clickbench_503091_1780666572
    • tpch_adapted_1_official_503116_1780666572
    • tpch_adapted_1_official_503125_1780666572
    • tpch_adapted_1_official_503133_1780666572

@KochetovNicolai KochetovNicolai added this pull request to the merge queue Jun 5, 2026
Merged via the queue into master with commit a77c28b Jun 5, 2026
166 of 167 checks passed
@KochetovNicolai KochetovNicolai deleted the actions-dag-node-column-const-ptr branch June 5, 2026 15:14
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants