Parallelize dictGetKeys constant path and constant fold inverse dictionary lookup with dictGetKeys#91164
Conversation
dictGetKeys and parallelize dictGetKeys constant pathdictGetKeys by constant folding and parallelize dictGetKeys constant path
dictGetKeys by constant folding and parallelize dictGetKeys constant pathdictGetKeys constant path and constant fold inverse dictionary lookup with dictGetKeys
|
Merged The only CI failure is @groeneai, could you take a look at #105396 and provide a fix in a separate PR if one isn't already in progress? Latest CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=91164&sha=1102a6dbcb0f2cd3788e5c04bbdc1b1a78b76121&name_0=PR&name_1=BuzzHouse%20%28amd_debug%29 |
|
Picked up the BuzzHouse failure (STID 1941-1bfa, issue #105396). The crash on this PR is a sub-variant of the same chronic round-trip family that PR #104991 (CODEC/BACKUP_NAME, merged) and PR #105839 (query parameter substitution, open) cover. This sub-variant is parser-side: the kql_array_sort subscript hack in Fix is in PR #106691 — gates the rewrite on Once #106691 is merged the BuzzHouse run for #91164 should be unblocked. I will keep monitoring this PR's CI. |
|
Merged The only CI red was the AST fuzzer logical error @groeneai, this is the known mutations |
…ctionaryLookupPass`
The merge of `master` into this branch dropped the `#include <Access/ContextAccess.h>`
and `#include <Access/Common/AccessType.h>` includes from
`InverseDictionaryLookupPass.cpp`. As a result the call
`getContext()->getAccess()->isGranted(AccessType::CREATE_TEMPORARY_TABLE)` failed to
compile with:
error: member access into incomplete type 'element_type'
(aka 'const DB::ContextAccessWrapper')
because `getAccess` returns a `shared_ptr` to the only-forward-declared
`ContextAccessWrapper` (forward-declared in `Interpreters/Context.h`), whose member
`isGranted` requires the full definition from `Access/ContextAccess.h`.
This broke both the `Fast test` and `Build (arm_tidy)` jobs. Re-add both Access
includes (matching `master`), since `ContextAccessWrapper::isGranted` and
`AccessType::CREATE_TEMPORARY_TABLE` are both used directly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ictgetkeys # Conflicts: # tests/queries/0_stateless/03701_optimize_inverse_dictionary_lookup_basic.reference # tests/queries/0_stateless/03702_optimize_inverse_dictionary_lookup_composite_and_layouts.reference # tests/queries/0_stateless/03703_optimize_inverse_dictionary_lookup_dictget_family.reference # tests/queries/0_stateless/03713_optimize_inverse_dictionary_lookup_setting_rewrite_in_to_join.reference # tests/queries/0_stateless/04201_inverse_dictionary_lookup_edge_cases.reference
Addresses review feedback on `InverseDictionaryLookupPass`: the `isCreateTemporaryTableGranted()` check ran before the new `dictGetKeys` constant-fold path, so a user with dictionary access but without the `CREATE TEMPORARY TABLE` grant never reached the fast path, even when the rewrite is only `key = const`, `key IN [..]`, or `0` and no `dictionary()` table function is built. Only the fallback `IN (SELECT ... FROM dictionary(...))` rewrite uses the `dictionary()` table function, so move the grant check down to that branch. The constant-fold path needs only the normal dictionary access of `dictGetKeys`, so it is no longer gated by the grant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ARY TABLE Proves, via `EXPLAIN`, that a user with `dictGet` access but without the `CREATE TEMPORARY TABLE` grant still reaches the `dictGetKeys` constant fold (`key IN [..]` and `key = const`): the folded plan contains neither `dictGet` nor a `dictionary()` subquery, the `dictionary()` table function is rejected with `ACCESS_DENIED`, and the optimized results match the unoptimized ones. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…master Master PR ClickHouse#94681 made `FunctionNode::toASTImpl` always format operators as function calls in `EXPLAIN SYNTAX` / `toAST` (e.g. `a IN b` renders as `in(a, b)`, `a < b` as `less(a, b)`, `a AND b` as `and(a, b)`, `a NOT LIKE b` as `notLike(a, b)`). After merging `master`, the inverse-dictionary-lookup and `dictGetKeys` plan dumps render in the new function-call form. Regenerated the references. The change is purely cosmetic plan formatting: every query result and data row is byte-identical to the previous references; only the rendering of operators inside `EXPLAIN` plans changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged Red Fast test — references after operators-as-functionsThe 6 failing Access-gating finding (
|
The constant path of `dictGetKeys` reads the dictionary with several
streams that pull blocks from a shared counter in
`DictionarySourceCoordinator`. Each `DictGetKeysMatchingRowsSink`
accumulated the matching keys of the blocks its stream happened to
consume, and the merge appended all chunks of sink 0, then sink 1, and
so on. With a dictionary spanning more than one block this grouped the
result by worker, so the user-visible array order became
scheduler-dependent and could differ from the single-stream vector path
(`dictGetKeys(d, a, 'x')` vs `dictGetKeys(d, a, materialize('x'))`).
Tag every chunk produced by `DictionarySource` with its sequential
block number (a `DictionaryBlockNumber` chunk info) and merge the
matched chunks in ascending block-number order in `executeConstPath`.
Block `n` always covers the key range
`[n * max_block_size, (n + 1) * max_block_size)`, so this reproduces the
single-stream scan order and keeps the result deterministic and
consistent with the vector path, while preserving the parallel scan.
Add `04413_dict_get_keys_const_path_order_consistency`, which forces
many one-row blocks (`max_block_size = 1`) and `max_threads = 8` so the
constant path is genuinely parallel, then pins its result (order
included) to the vector path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ookup-dictgetkeys
|
Addressed the AI review Request changes verdict (deterministic Fix (
Also merged current The remaining unresolved thread on @groeneai, the only red on the previous run was |
|
@alexey-milovidov Confirmed: CIDB over the last 30 days: 10 master hits plus 104 distinct unrelated PRs (131 total events), from 2026-06-06 to 2026-06-30 05:02 UTC, across Master example, same check: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=cd3c529cd3a8002caa1a4ba585a39da636836784&name_0=MasterCI&name_1=AST%20fuzzer%20%28amd_debug%29&name_1=AST%20fuzzer%20%28amd_debug%29 In-flight fix: #107719. Expect it to clear on the re-run. |
… signed-zero Addresses the two blocking findings of the AI review (`❌ Block` verdict). 1. Non-coordinator dictionary layouts (`DIRECT`, `COMPLEX_KEY_DIRECT`, `POLYGON`, `REGEXP_TREE`, ...) read in a single stream and do not tag their chunks with a `DictionaryBlockNumber`. The constant path previously asserted the tag with `chassert` and then dereferenced it unconditionally, so a release build (where `chassert` does not evaluate) would dereference a null chunk info for a valid `dictGetKeys(dict, attr, const)` call. The block number is now optional: it is only read when present, and the block-number merge sort runs only for multi-stream reads (`num_streams > 1`), which come exclusively from `DictionarySourceCoordinator` and always attach the tag. Single-stream layouts keep the sink's insertion (scan) order. 2. The vector path used the raw `sipHash128AtRow` as a mandatory prefilter before the `equals` confirmation. `ColumnVector<T>::updateHashWithValue` hashes the stored bytes, so `0.0` and `-0.0` hash differently even though SQL `equals(0.0, -0.0)` is true, which made `dictGetKeys(d, 'f', CAST(-0.0 AS Float64))` and its `materialize(...)` form disagree. Floating-point attributes (optionally `Nullable`) now normalize signed zero in the bucket / cache hash on both the query-value side and the dictionary side, so SQL-equal values always share a bucket and the `equals` confirmation then decides the match. New regressions: - `04490_dict_get_keys_float_signed_zero_consistency` pins the constant and vector paths together for `Float32` / `Float64` with both zero signs in the dictionary. - `04491_dict_get_keys_const_path_non_coordinator_layout` exercises the constant path on `DIRECT` / `COMPLEX_KEY_DIRECT` (must not crash and must agree with the vector path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review 1. Blocker — non-coordinator dictionary layouts ( 2. Major — vector-path floating zero ( Locally re-ran the full The only remaining unresolved thread ( @groeneai, the sole CI red is |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 318/332 (95.78%) · Uncovered code |



Optimization of the constant path:
See #91164 (comment).
Constant folding:
This query:
Previously became:
Now:
If the rhs set of
INhas size 1, it becomes:If rhs set of
INis empty, then:Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Optimize inverse dictionary lookups. A constant equality predicate such as
WHERE dictGet(dict, attr, key_expr) = valueis now constant-folded directly into a key filter (key_expr = const,key_expr IN [...], orWHERE 0) instead of being rewritten to anIN (SELECT ... FROM dictionary(...))subquery, and the constant path ofdictGetKeysnow executes in parallel.Documentation entry for user-facing changes
Note
Medium Risk
Changes query-tree rewrite logic for
dictGetpredicates and makesdictGetKeysconstant execution multi-threaded, which can affect query planning/semantics and performance under various dictionary layouts and type-cast edge cases.Overview
Improves inverse dictionary lookup optimization by constant-folding eligible
dictGetequality predicates into a precomputed key list usingdictGetKeys, rewriting tokey = <const>,key IN [..], orWHERE 0(while preserving original nullable result types and skipping cases wheredictGet-family casting would change semantics).Refactors the fallback rewrite path to gate
dictionary()-subquery rewrites onCREATE TEMPORARY TABLEaccess only when needed, and replaces analyzer-pass based resolving with directStorageDictionary/table-function resolution.Speeds up
dictGetKeysfor constant inputs by switching the constant path from single-threaded pulling to a parallel pipeline with per-stream sinks that filter matching rows and aggregate keys, plus adds a performance test and updates multiple stateless references/RBAC regression coverage to reflect the new constant-folded plans.Reviewed by Cursor Bugbot for commit 0f7565c. Bugbot is set up for automated code reviews on this repo. Configure here.
Version info
26.7.1.293