Apply exact row-positioning for vector search rescoring queries#105591
Apply exact row-positioning for vector search rescoring queries#105591shankar-iyer merged 16 commits into
Conversation
Apply the fused `_distance` rewrite only when `vector_search_with_rescoring` can preserve regular reranking semantics. Unsupported cases such as `dotProduct`, non-`Array(Float32)` vectors, and non-`Float32` distance outputs keep the existing `ExpressionStep` rerank path. Also keep scanning after fallback-only matches in the second `useVectorSearch` pass.
Keep row offsets alive after clearing read hints on no-distance vector index results. Preserve `fused_rescore_vector_column` when cloning `ReadFromMergeTree` steps so unused-column pruning still keeps raw vectors for fused rescoring. Replace debug-only read hint shape assertions with release `LOGICAL_ERROR`s.
|
@alexey-milovidov @abashkeev @nikitamikhaylov, this PR is for your attention |
|
Thanks for the contribution! This PR logically matches one of the items in our roadmap - #104122 and My idea was to just enable the row-positioning optimization even when rescoring is ON (and drop the |
|
@shankar-iyer thanks, this makes sense. I originally extended the existing _distance rewrite because it was the shortest path to proving that exact row positioning helps vector_search_with_rescoring = 1: keep the vector column, compute full-precision distance in the reader, and wire the sort expression to _distance _. But I agree that the cleaner ClickHouse-native version is to decouple row-positioning from _distance:
That should preserve the existing function semantics and avoid custom distance implementations in the reader. I’ll rework the PR in this direction. |
Keep `vector_search_with_rescoring = 1` on the regular `ExpressionStep` distance path, but preserve vector-index row hints and let `MergeTreeRangeReader` filter to exact candidate rows before rescoring. Continue filling `_distance` only when it is requested; explicit `_distance` reads still ask `USearch` for distances. This removes the reader-side fused rescore prototype and related custom kernels, matching the maintainer-requested row-positioning architecture from ClickHouse#105591 (comment). Tests: - `ninja -C build_vector_pr programs/clickhouse` - `tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring 04105_vector_search_rescoring_non_float32_elements 04106_vector_search_rescoring_cosine_zero 04108_vector_search_rescoring_float64_reference_precision 04109_vector_search_rescoring_distance_select 04211_vector_search_rescoring_dot_product_fallback` (8 passed, 0 skipped) - Local benchmark on 50k x dim384: row-positioned exact rescoring 246 QPS vs full exact scan 72.8 QPS; no-rescore `_distance` path 336 QPS
Keep vector search row hints by reference while building mark ranges and remove unused includes left from the previous rescoring prototype. Build: ninja -C build_vector_pr programs/clickhouse
|
@shankar-iyer I reworked the PR in the direction you suggested. The reader-side fused rescoring prototype is gone now:
Current architecture:
So the PR now decouples row-positioning from I also kept explicit Validation added/updated:
Local verification:
Local benchmark, synthetic
So for this benchmark the reworked exact-rescoring path is about |
|
The following is a review note from Claude (posted by @shankar-iyer on its behalf). Thanks for iterating on this. A suggestion on framing for the next revision: the smallest change that makes rescoring use the existing row-positioning filter is a much safer starting point than a new code path. The current diff is ~530 lines, but the actual behavior delta ("apply the filter the index already produces, even when Concretely, the smallest sufficient change is:
That's it. No new flag, no optimizer-side plumbing, no changes to A few things in the current diff that are worth dropping for risk reasons:
On the benchmark: the relevant baseline is And the PR title still says "Fuse vector search rescoring in MergeTree reads", which no longer reflects what the PR does — nothing is being fused. Something like "Apply exact row-positioning for vector search rescoring queries" would match, with a matching changelog entry. As an aside, if agentic AI / coding tools are part of the workflow here, a tightly-scoped framing like "enable the existing row-positioning optimization even when Happy to share a sketch of the minimal patch as a separate branch if that helps as a reference point. Thanks for working on this. |
|
Please check above comment from Claude, I agree with that. |
Address review feedback from ClickHouse#105591 (comment) and ClickHouse#105591 (comment). Keep `vector_search_with_rescoring = 1` in the SQL pipeline and use vector-search read hints only as an exact row-position filter before `ExpressionStep`. Fill `_distance` only when the read header requests it. Keep an explicit `use_vector_search_result_filter` signal because `vector_search_results` can be present for mark pruning without enabling row-level filtering; the existing `PREWHERE` case covers this distinction. Remove the broad reader-side fused rescoring prototype and its extra stateless tests.
|
@shankar-iyer thanks, agreed on keeping the PR smaller and on not using the earlier local benchmark framing as a I updated the PR title/description and pushed a scoped revision:
I kept one explicit Validation:
I removed the old local perf claim from the PR description. The relevant performance comparison should be against current |
|
Workflow [PR], commit [b4574a1] Summary: ❌
AI ReviewSummaryThis PR narrows Missing context / blind spots
Final VerdictStatus: ✅ Approve |
|
Stress test run with binary quantization and rescoring_multiplier = 10 Earlier run without PR - Optimization is cool - 195 secs -> 155 secs for 1000 queries for k=100 |
|
@shankar-iyer Thanks for running the stress test! This is useful because it validates the intended production-like path: binary quantization with exact rescoring. The measured runtime improvement is about 20.6% wall-clock reduction, or about 25.9% higher query throughput for the 1000-query run. The small recall difference also makes sense to me. This PR removes the previous granule spillover behavior: instead of rescoring all rows from the granules returned by the vector index, it applies the exact row-position filter first and then lets the regular SQL expression compute the full-precision distance. That should reduce unnecessary reads and distance evaluations, but it can also remove incidental extra candidates that happened to live in the same granules. For workloads that need the previous recall level, increasing the fetch/rescoring multiplier should be the right knob. So I would frame this as a latency/read-amplification improvement, not as a recall improvement. |
| SELECT 'Test with enabled rescoring'; | ||
| -- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules | ||
| -- Expect only the exact rows returned by the vector index. The old granule-level | ||
| -- rescoring behavior also returned 18 and 17 because they are in the same granules. |
There was a problem hiding this comment.
No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.
| LIMIT 4 | ||
| SETTINGS vector_search_with_rescoring = 1; | ||
| SETTINGS vector_search_with_rescoring = 1, | ||
| optimize_move_to_prewhere = 0, |
There was a problem hiding this comment.
I think we need one more change. This location -
We should drop the !settings[Setting::vector_search_with_rescoring], please test for side effects (hopefully none). Then we don't need query_plan_optimize_prewhere = 0 in this query and next.
Disable implicit `PREWHERE` movement for vector search lookups regardless of `vector_search_with_rescoring` so the row-positioning/rescoring path does not depend on per-query `optimize_move_to_prewhere` opt-outs. The focused stateless test now exercises default `PREWHERE` settings for rescoring queries. Validation: `ninja -C build_vector_pr programs/clickhouse`; focused `02354_vector_search_rescoring_and_prewhere`; vector-search stateless sweep; `git diff --check`.
|
Thanks, this was the right simplification. I updated the PR so Validation I re-ran locally:
I also did a focused local perf sanity check for this guard change only. Setup: synthetic
I discarded a lower-multiplier selective run from perf claims because row counts diverged ( |
shankar-iyer
left a comment
There was a problem hiding this comment.
Please remove explicit setting edits in the 2 test files : query_plan_optimize_lazy_materialization, compile_expressions etc.
LGTM from my side. Let's wait for feedback from @rschu1ze
Address review feedback from ClickHouse#105591 (review). Let the vector rescoring stateless tests run with default lazy materialization, function-after-sorting, and expression compilation settings. Keep only the explicit PREWHERE toggles in the test block that intentionally exercises PREWHERE behavior. Validation: targeted stateless run for 02354_vector_search_rescoring, 02354_vector_search_rescoring_distance_in_select_list, and 02354_vector_search_rescoring_and_prewhere passed; git diff --check.
|
Addressed. I removed the explicit helper setting overrides from the two vector rescoring stateless tests:
I kept only the explicit Re-checked locally after the change:
Result: |
Avoid applying the no-rescore plan rewrite when `_distance` is already requested explicitly, because the `ReadFromMergeTree` step has already replaced the vector column in that shape. Add stateless coverage for multi-granule vector results with explicit `_distance`, and qualify the `vector_search_with_rescoring` documentation around exact row-position filtering. Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment)
rschu1ze
left a comment
There was a problem hiding this comment.
Sorry for the late review, this PR was somehow not on my radar.
I had a couple of comments but the PR overall looks good. Thanks.
| SELECT 'Test with enabled rescoring'; | ||
| -- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules | ||
| -- Expect only the exact rows returned by the vector index. The old granule-level | ||
| -- rescoring behavior also returned 18 and 17 because they are in the same granules. |
There was a problem hiding this comment.
No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.
Clarify `vector_search_with_rescoring` docs and test wording without using the `exact row-positioning` term in user-facing text. Document what `use_vector_search_result_filter` means for row filtering versus mark pruning and `_distance` population. Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment)
Update the `useVectorSearch` comment so it describes both rescoring candidate-row filtering and the non-rescoring `_distance` path accurately.
|
Pushed follow-up fixes on Addressed the latest review feedback:
Local verification on the pushed branch:
Result: |
|
📊 Cloud Performance Report ✅ AI verdict: This PR changes only the vector-similarity-search rescoring path: the query-plan optimizations that handle vector indexes (PREWHERE conflict resolution, second-pass candidate-row filtering) and the MergeTree read-hint plumbing that feeds vector distances back into the reader. ClickBench queries do not use vector-similarity indexes, so none of the 10 flagged deltas can plausibly be caused by this change. The fact that many unrelated queries all shift in the same direction by ~6-18%, with only 6 source runs against 30 base runs, points to instance/build-level variance rather than a real per-query effect. All flagged queries are downgraded to not-sure. clickbenchFlagged queries (10 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_official🟢 No significant changes Debug info
|
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 107/116 (92.24%) | Lost baseline coverage: none · Uncovered code |
|
Only failure right now is in the "Performance Comparision", unrelated and is noise. |
d562138
…es" (ClickHouse#105591) This reverts the merge of PR ClickHouse#105591 (commit d562138), reversing changes made to 6519f67. ClickHouse#105591 broke 02354_vector_search_rescoring on master across all build configs with a deterministic Code 44 ILLEGAL_COLUMN ("The _distance column is an internal virtual column of vector search and cannot be referenced directly"). Root cause is a base-skew semantic conflict between two PRs merged 65 minutes apart, neither of which saw the other in CI: - ClickHouse#107985 (merged 10:05 UTC) added a hard ILLEGAL_COLUMN guard in both passes of useVectorSearch.cpp, making _distance internal-only. It closed customer incident clickhouse-core-incidents#1654 and added test 02354_vector_search_incident1654 asserting the guard. - ClickHouse#105591 (merged 11:10 UTC) added a feature that deliberately supports referencing _distance directly, with test queries and reference output that expect it to succeed. The guard from ClickHouse#107985 pre-empts ClickHouse#105591's new code path, so ClickHouse#105591's own tests fail on master. Reverting ClickHouse#105591 (the later, feature PR) restores master to green while preserving the customer-incident ClickHouse#1654 guard. The product decision of whether _distance should be user-referenceable is left to the vector-search owners to reconcile and re-land. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…5591-vector-rescoring-distance-conflict Revert "Apply exact row-positioning for vector search rescoring queries" (ClickHouse#105591)
Pick up revert e531152 of PR ClickHouse#105591 (vector search rescoring exact row-positioning) which caused 02354_vector_search_rescoring to fail with ILLEGAL_COLUMN on _distance. Our branch was rebased onto the pre-revert master pin d562138.

This implements exact row-positioning for
vector_search_with_rescoring = 1, matching the roadmap direction in #104122 and #103745.Today rescoring can still carry the vector column downstream and evaluate the exact distance expression for all rows in selected granules. With this change, the optimizer keeps the original SQL distance expression, propagates vector-search read hints with an explicit row-filter signal, and
MergeTreeRangeReaderapplies the exact part-offset filter before the downstreamExpressionStep._distanceis still populated only when the read header requests it, so the existing no-rescoring_distancepath remains separate.The explicit
use_vector_search_result_filterflag keeps mark-pruning hints separate from row-level filtering. This avoids changing plans wherevector_search_resultsexist but the row filter is intentionally disabled, such as the explicitPREWHEREcase covered by the focused stateless test.Local checks:
ninja -C build_vector_pr programs/clickhousetests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring: 3 tests passed, 0 skipped.tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring_and_prewhere --test-runs 5: 5 tests passed, 0 skipped.Local focused benchmark:
masterat3b7724bb303ef9fa9de902; latest pushed commit5a30362c5a8is test-only50krows,Array(Float32),dim = 384vector_similarity('hnsw', 'L2Distance', 384, 'f32', 16, 64) GRANULARITY 100000000ORDER BY L2Distance(vec, ref) LIMIT 20vector_search_with_rescoring = 1,max_threads = 1,query_plan_optimize_lazy_materialization = 0,query_plan_execute_functions_after_sorting = 0master18.150521.689+19.5%0.053s0.0445s-16.0%0.056s0.0465s-17.0%0.060s0.049s-18.3%Both plans use the vector index (
Skip / vector_similarity,Granules: 7/8) and keep exact rescoring in the SQL pipeline (ReadFromMergeTreereadsid UInt32andvec Array(Float32), no_distancerewrite forvector_search_with_rescoring = 1).Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Apply exact row-positioning for vector search queries with
vector_search_with_rescoringbefore exact distance evaluation in the query pipeline.Documentation entry for user-facing changes
Version info
26.7.1.238