Add LOOKUP INDEX support for MergeTree#101401
Conversation
|
No, ANY/ALL is controlled by the JOIN strictness in the query itself, not by a separate LOOKUP INDEX option. |
|
Workflow [PR], commit [eca1052] Summary: ❌
AI ReviewSummaryThis PR adds experimental table-wide Findings
Tests
Final VerdictStatus: Minimum required action: fix the |
Merging `master` pulled in commit `5188f66f7bf` ("Reject CREATE INDEX with
GRANULARITY 0"), which made `IndexDescription::getIndexFromAST` reject any
index declaration with `granularity == 0` and made
`ASTIndexDeclaration::formatImpl` always emit the `GRANULARITY` clause.
Both changes break `LOOKUP INDEX`. A lookup index is a table-wide in-memory
structure with no granularity, so the parser stores granularity 0 and rejects
an explicit `GRANULARITY` clause. The new check therefore rejected every
`CREATE TABLE ... LOOKUP INDEX ...` with `Index GRANULARITY must be a positive
integer (BAD_ARGUMENTS)`, and the unconditional format would have emitted
`GRANULARITY 0` for a lookup index, breaking the format/parse round-trip.
Fixes:
- `getIndexFromAST` skips the granularity check for lookup indexes
(`is_lookup_index`); it still guards granule-based skip indexes.
- `formatImpl` emits `GRANULARITY` only for non-lookup indexes.
- `buildLookupIndices` marks the declaration as a lookup index (and resets
granularity) before calling `getIndexFromAST`, so the exemption also applies
to the metadata re-parse path (`parseLookupIndices`), which does not set the
flag in the parser.
This restored all `LOOKUP INDEX` stateless tests (`04063`, `04064`, `04068`,
`04069`, `04244`, `04312`) that failed in the Fast test, while keeping master's
`04326_reject_create_index_granularity_zero` passing.
CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101401&sha=c2f4395d6e9707ee4b48418b750ed39c4f84aaac&name_0=PR&name_1=Fast%20test
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A `table_join` lookup index serves an `ANY` join through a key-value direct join (`MergeTreeTableJoinEntity` via `DirectKeyValueJoin`), which always returns the first stored right row for a key. `join_any_take_last_row = 1` instead asks an `ANY` join to keep the last duplicate-key right row, which only the regular `HashJoin` honors. Without a guard, enabling a lookup index would silently change `ANY` join results for duplicate keys when that setting is on. Decline the lookup fast path in `tryExtractLookupJoinRightKeys` for `ANY` strictness when `join_any_take_last_row` is enabled. Since `tryMakeDirectJoinWithMergeTree` already declines `ANY`, the join then falls back to `HashJoin`, which keeps the configured semantics. The setting has no effect on `ALL`/`SEMI`/`ANTI` joins, so the lookup index is still used there. Adds `04360_lookup_index_table_join_any_take_last_row`, which asserts via `EXPLAIN` that the lookup (`DirectKeyValueJoin`) is used for `ANY` joins without the setting and declined with it (while `ALL` keeps it), and that enabling the index does not change `ANY` join results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`tryGetLookupJoinStorage` could prepare a `table_join` lookup direct join even when the right `MergeTree` table expression reads `ALIAS` columns. The lookup entity `MergeTreeTableJoinEntity` is built from physical columns only (`getAllPhysicalColumnsForLookupJoin`), but `DirectKeyValueJoin` later requests the full right-side header, including the computed alias columns. For a query like `SELECT d.val_alias FROM f JOIN d USING (id)` where `val_alias ALIAS concat(val, '!')`, the optimized path asked `MergeTreeTableJoinEntity::getSampleBlock` for `val_alias`, which is not in the cache, raising `Cannot find column 'val_alias' in table lookup cache`; the regular plan would have computed the alias. Fall back to the regular join path when the right table expression reads alias columns, so the alias is computed. Add `04401_lookup_index_alias_column` and a documentation note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked up the PR to address the open AI review finding. Fix ( The single
Verification Built locally and ran all lookup-index stateless tests against a server — CI The red checks on the reviewed commit are all unrelated master-wide flakes, confirmed against CIDB:
Not done (left for you)
|
`getInMemoryMetadataPtr(ContextPtr, bool)` now returns a `StorageMetadataHandle` whose rvalue `operator->` is explicitly deleted to prevent dereferencing a temporary handle. `hasLookupSetIndex` and `hasLookupJoinIndex` called `->getLookupIndices()` directly on the returned temporary, selecting the deleted overload and failing to compile. Bind the handle to a named local before dereferencing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ze limits Addresses two AI review findings on `tryGetLookupJoinStorage`: - The lookup-join entity (`MergeTreeTableJoinEntity`) is built from the top-level physical columns only (`getAllPhysicalColumnsForLookupJoin`). The previous guard only rejected `ALIAS` payload columns, but the analyzer can still map a right-side column identifier to a subcolumn (e.g. `m.keys`) or a virtual column (e.g. `_part`). Those are neither alias expressions nor present in the entity, so the fast path built an entity and `DirectKeyValueJoin` later failed with `Cannot find column ... in table lookup cache` instead of falling back to the regular plan that materializes the column. Now fall back unless every mapped right-side column is served by the lookup entity. - The lookup build path never applied the per-query right-side join size limits (`max_rows_in_join` / `max_bytes_in_join`) that the regular join build enforces: a `table_join` lookup index reads and caches the whole right table, and later cache hits bypass the limits entirely. Decline the fast path when either limit is active so the regular join enforces them. Adds `04408_lookup_index_table_join_subcolumn` and `04409_lookup_index_table_join_size_limits`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The lookup fast path is disabled when parallel replicas can run on the initiator (`!canUseParallelReplicasOnInitiator()`), so when the test harness randomly enabled `enable_parallel_replicas` the physical-only join no longer used `DirectKeyValueJoin` and the test failed intermittently. Pin `enable_parallel_replicas = 0`, `max_parallel_replicas = 1` and `serialize_query_plan = 0`, matching the other lookup index tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked up the PR to address the two open AI review findings on the Build fix (
Review fixes (
New tests Flaky test fix (
Verification Built locally and ran the full lookup index suite ( |
`validateTableLookupKeys` accepted every `ASTIdentifier` as a plain lookup key, but an unquoted subcolumn such as `m.keys` is also an `ASTIdentifier` with `compound() == true`. Since `IndexDescription::initExpressionInfo` analyzes with subcolumns enabled, `LOOKUP INDEX idx (m.keys) TYPE table_join` was accepted even though the lookup entity is built only from top-level physical columns and cannot serve a subcolumn, leaving an accepted but unusable index. Reject such compound identifiers at validation time. A quoted top-level column name containing a dot stays a single-part identifier (`compound()` is false) and is still accepted. Add `04411_lookup_index_compound_key_rejected` covering `CREATE` and `ALTER ... ADD LOOKUP INDEX` for both `table_set` and `table_join`, plus the quoted-dotted-name acceptance case.
…nfig The `table_join` lookup fast path declines whenever `max_rows_in_join` or `max_bytes_in_join` is non-zero (the build does not enforce these limits, so it falls back to a regular join that does). The standard stateless config (`tests/config/users.d/limits.yaml`) sets both to a high ceiling (`10G`) "so it will not limit anything, but we test code around it", which made the fast path decline universally under Fast test and the lookup tests' `DirectKeyValueJoin` assertions failed (expected `1`, got `0`). Reset `max_rows_in_join` / `max_bytes_in_join` to `0` at session level in the tests that assert the fast path fires, so they exercise it regardless of the profile ceiling. `04409_lookup_index_table_join_size_limits` keeps its per-query non-zero limits to assert the decline path; those override the session reset.
…istory `master` advanced the development version to `26.7` and `02995_new_settings_history` now requires every setting that is new relative to the `26.6.1` baseline to be recorded in `SettingsChangesHistory.cpp` for a version greater than `26.6`. Move `allow_experimental_lookup_index` from the `26.6` section to the `26.7` section so the check passes.
|
Pushed What failed and whyThe previous run went red on six Root cause of the six: the earlier fix that declines the Commits
Verification (local, real server)Reproduced the Fast-test condition by mirroring
Design question to confirm (not changed here)The decline-on-non-zero behavior means the |
The `IN table` / `IN (SELECT key FROM table)` lookup fast path in `tryGetLookupSetFromTableExpression` builds the `Set` directly from storage (`MergeTreeData::tryGetLookupSet`) and registers it with `sets.addFromStorage`, so the right-hand table is never planned. The regular path plans that table (or its subquery) and performs a `SELECT` access check on the columns it reads (`checkAccessRights` / `checkAccessRightsForSubquery` in `PlannerJoinTree`), which the fast path skipped. A user holding `SELECT` only on the left table could therefore probe the keys of a dimension table it has no access to through `IN table`, but only when the fast path actually fires (no active `max_rows_in_set` / `max_bytes_in_set` limit). Perform the same `SELECT` access check on the lookup key columns before using the lookup `Set`, in both the direct `IN table` and the trivial `IN (SELECT key FROM table)` branches, matching the regular path's behaviour. Add `04412_lookup_index_table_set_access` covering both forms with and without access (the test resets `max_rows_in_set` / `max_bytes_in_set` to 0 so the fast path fires under the standard config). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked up the PR after the CI run resultThe Fast-test / stateless suite is green — all lookup-index tests pass, confirming the size-limit and validation fixes from the previous run landed correctly. The remaining red checks are all unrelated master-wide flakes (confirmed against CIDB):
Fix this run (
|
The `table_set` lookup fast path in `tryGetLookupSetFromTableExpression` performed the `SELECT` access check on the lookup key columns *after* calling `storage->tryGetLookupSet`, which reads the right-hand table and may populate the shared lookup `Set` cache. An unauthorized user was denied the final result, but could still force a full read and cache build of a table they cannot `SELECT`. Move the `checkAccess(AccessType::SELECT, storage_id, key_names)` call before `tryGetLookupSet` in both the direct `IN table` branch and the trivial `IN (SELECT key FROM table)` subquery branch, so access is enforced before any read happens. Access enforcement is already covered by `04412_lookup_index_table_set_access`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `Using table_set` example created `dim` with three columns (`id`, `subid`, `value`) but used `(id, subid) IN dim`. The direct `IN table` form builds the right-hand set from the table's columns, so this both fails with a column-count mismatch against the two-element left-hand tuple and does not match the two-column `LOOKUP INDEX idx_set (id, subid)`, so it never exercises the lookup index. Use a key-only `dim_keys` table for the direct `IN table` example so it is executable and actually uses the index, and show the trivial-subquery form `IN (SELECT id, subid FROM dim)` for dimension tables that also store non-key columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the two
The remaining red checks are unrelated master-wide flakes, not caused by this PR (which is SELECT-path only): Stress |
| storage_key_names.push_back(table_column_name_it->second); | ||
| } | ||
|
|
||
| result.storage_key_value = storage->tryGetLookupJoin(storage_key_names, planner_context->getQueryContext()); |
There was a problem hiding this comment.
The lookup join path can read and cache columns the query did not have SELECT access to. prepareBuildQueryPlanForTableExpression only checks table_expression_data.getSelectedColumnsNames(), but MergeTreeData::tryGetLookupJoin rebuilds the entity from getAllPhysicalColumnsForLookupJoin, i.e. every physical column. A user granted only the join key and requested payload columns can therefore run a join that calls this line and forces a read and cache build containing protected physical columns that the normal join plan would never read.
Please either build the table_join entity from the columns required by this join plus keys, or check SELECT on the full lookup_columns set before tryGetLookupJoin can read and cache them.
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 941/1112 (84.62%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 44 line(s) · Uncovered code |

This change adds table-wide
LOOKUP INDEXsupport forMergeTreetables and implements theSetandJointables as indices task from #87836. Unlike secondary data-skipping indices, these indices build reusable in-memory lookup structures for repeated key-value style queries against the right-hand side table.It introduces:
LOOKUP INDEX ... TYPE table_setforIN tablelookups and trivial key-only subqueriesLOOKUP INDEX ... TYPE table_joinfor direct-compatible equality joins withjoin_algorithm = 'direct'ALTER, andReplicatedMergeTreesupportCurrent limitations:
table_joinrequires non-nullable key columnstable_joinis only used for direct-compatible equality joinstable_joinis not always faster thanhash; the included benchmarks show win, near-parity, and lose cases depending on the relative sizes of the left and right tablesChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add
LOOKUP INDEXsupport forMergeTreetables withtable_setandtable_jointable-wide lookup indexes.Documentation entry for user-facing changes
Note
High Risk
High risk because it adds a new experimental index type that touches SQL parsing/formatting, ALTER/replication metadata, MergeTree read paths and caching, and join/set planner optimizations (including direct join behavior), which can affect query correctness and performance.
Overview
Adds experimental
LOOKUP INDEXsupport toMergeTree-family tables (gated byallow_experimental_lookup_index) with two new index types:table_setto accelerateIN table/trivial key-only subqueries andtable_jointo accelerate direct-compatible equality joins.Extends SQL/metadata plumbing end-to-end: new AST fields and parser/formatter support for
LOOKUP INDEX(includingALTER TABLE ... ADD/DROP LOOKUP INDEX), metadata storage and replication viaReplicatedMergeTreeTableMetadata, and validation to prevent misuse (e.g., noGRANULARITY, no expressions/args, and non-nullable keys fortable_join).Implements runtime lookup caches in
MergeTreeData(with invalidation on metadata/part changes) and integrates planner fast paths for sets (CollectSets) and joins (PlannerJoins/DirectKeyValueJoin) while preserving correctness by disabling the optimization when row policies or unsupported modifiers/parallel replicas are in play. Adds documentation plus new stateless/performance tests for both lookup modes.Reviewed by Cursor Bugbot for commit 6d94b00. Bugbot is set up for automated code reviews on this repo. Configure here.