Apply exact row-positioning for vector search rescoring queries by skuznetsov · Pull Request #105591 · ClickHouse/ClickHouse · GitHub
Skip to content

Apply exact row-positioning for vector search rescoring queries#105591

Merged
shankar-iyer merged 16 commits into
ClickHouse:masterfrom
skuznetsov:codex/ch-vector-rescoring-fastpath
Jun 29, 2026
Merged

Apply exact row-positioning for vector search rescoring queries#105591
shankar-iyer merged 16 commits into
ClickHouse:masterfrom
skuznetsov:codex/ch-vector-rescoring-fastpath

Conversation

@skuznetsov

@skuznetsov skuznetsov commented May 21, 2026

Copy link
Copy Markdown
Contributor

This implements exact row-positioning for vector_search_with_rescoring = 1, matching the roadmap direction in #104122 and #103745.

Today rescoring can still carry the vector column downstream and evaluate the exact distance expression for all rows in selected granules. With this change, the optimizer keeps the original SQL distance expression, propagates vector-search read hints with an explicit row-filter signal, and MergeTreeRangeReader applies the exact part-offset filter before the downstream ExpressionStep. _distance is still populated only when the read header requests it, so the existing no-rescoring _distance path remains separate.

The explicit use_vector_search_result_filter flag keeps mark-pruning hints separate from row-level filtering. This avoids changing plans where vector_search_results exist but the row filter is intentionally disabled, such as the explicit PREWHERE case covered by the focused stateless test.

Local checks:

  • ninja -C build_vector_pr programs/clickhouse
  • tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring: 3 tests passed, 0 skipped.
  • tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring_and_prewhere --test-runs 5: 5 tests passed, 0 skipped.

Local focused benchmark:

  • baseline: PR merge-base master at 3b7724bb303
  • PR build: ef9fa9de902; latest pushed commit 5a30362c5a8 is test-only
  • dataset: 50k rows, Array(Float32), dim = 384
  • index: vector_similarity('hnsw', 'L2Distance', 384, 'f32', 16, 64) GRANULARITY 100000000
  • query shape: ORDER BY L2Distance(vec, ref) LIMIT 20
  • settings include vector_search_with_rescoring = 1, max_threads = 1, query_plan_optimize_lazy_materialization = 0, query_plan_execute_functions_after_sorting = 0
  • each run: 30 warmup queries, then 100 measured queries
metric merge-base master this PR delta
QPS 18.1505 21.689 +19.5%
p50 0.053s 0.0445s -16.0%
p95 0.056s 0.0465s -17.0%
p99 0.060s 0.049s -18.3%

Both plans use the vector index (Skip / vector_similarity, Granules: 7/8) and keep exact rescoring in the SQL pipeline (ReadFromMergeTree reads id UInt32 and vec Array(Float32), no _distance rewrite for vector_search_with_rescoring = 1).

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Apply exact row-positioning for vector search queries with vector_search_with_rescoring before exact distance evaluation in the query pipeline.

Documentation entry for user-facing changes

  • Documentation is not required.

Version info

  • Merged into: 26.7.1.238

Apply the fused `_distance` rewrite only when `vector_search_with_rescoring` can preserve regular reranking semantics.

Unsupported cases such as `dotProduct`, non-`Array(Float32)` vectors, and non-`Float32` distance outputs keep the existing `ExpressionStep` rerank path.

Also keep scanning after fallback-only matches in the second `useVectorSearch` pass.
Keep row offsets alive after clearing read hints on no-distance vector index results.

Preserve `fused_rescore_vector_column` when cloning `ReadFromMergeTree` steps so unused-column pruning still keeps raw vectors for fused rescoring.

Replace debug-only read hint shape assertions with release `LOGICAL_ERROR`s.
@skuznetsov

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov @abashkeev @nikitamikhaylov, this PR is for your attention

@rschu1ze rschu1ze self-assigned this May 22, 2026
@shankar-iyer

shankar-iyer commented May 25, 2026

Copy link
Copy Markdown
Member

Thanks for the contribution! This PR logically matches one of the items in our roadmap -

#104122
Apply exact row positioning optimization when rescoring with quantized vector indexes - the vector column will still be read, but rescoring distance calculation will only be executed on specific neighbour rows rather than all rows in the granule. Currently the entire optimization is disabled if rescoring is ON. Tracked here : #103745

and

#103745 (comment)

My idea was to just enable the row-positioning optimization even when rescoring is ON (and drop the _distance column). That would require a small change in MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch() and maybe also in useVectorSearch.cpp. As I wrote, currently we rescore with all vectors in a granule - we really only need to rescore with one (or handful) of vector rows in the granule that was returned by usearch. And this rescoring will happen naturally in the SQL pipeline in the ExpressionStep evaluation. Can you please explain why have you re-implemnented rescoring and added custom distance functions in MergeTreeRangeReader itself?

@skuznetsov

Copy link
Copy Markdown
Contributor Author

@shankar-iyer thanks, this makes sense.

I originally extended the existing _distance rewrite because it was the shortest path to proving that exact row positioning helps vector_search_with_rescoring = 1: keep the vector column, compute full-precision distance in the reader, and wire the sort expression to _distance _.

But I agree that the cleaner ClickHouse-native version is to decouple row-positioning from _distance:

  • for vector_search_with_rescoring = 0, keep the existing _distance rewrite;
  • for vector_search_with_rescoring = 1, propagate vector-search read hints even when _distance is not requested;
  • in MergeTreeRangeReader, build/apply the row-offset filter from the hints, but fill _distance only if the read header requests it;
  • leave the original L2Distance / cosineDistance / dotProduct expression in the SQL pipeline, so it is evaluated only after row-position filtering.

That should preserve the existing function semantics and avoid custom distance implementations in the reader. I’ll rework the PR in this direction.

Keep `vector_search_with_rescoring = 1` on the regular `ExpressionStep` distance path, but preserve vector-index row hints and let `MergeTreeRangeReader` filter to exact candidate rows before rescoring. Continue filling `_distance` only when it is requested; explicit `_distance` reads still ask `USearch` for distances.

This removes the reader-side fused rescore prototype and related custom kernels, matching the maintainer-requested row-positioning architecture from ClickHouse#105591 (comment).

Tests:

- `ninja -C build_vector_pr programs/clickhouse`

- `tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring 04105_vector_search_rescoring_non_float32_elements 04106_vector_search_rescoring_cosine_zero 04108_vector_search_rescoring_float64_reference_precision 04109_vector_search_rescoring_distance_select 04211_vector_search_rescoring_dot_product_fallback` (8 passed, 0 skipped)

- Local benchmark on 50k x dim384: row-positioned exact rescoring 246 QPS vs full exact scan 72.8 QPS; no-rescore `_distance` path 336 QPS
Keep vector search row hints by reference while building mark ranges and remove unused includes left from the previous rescoring prototype.

Build: ninja -C build_vector_pr programs/clickhouse
@skuznetsov

Copy link
Copy Markdown
Contributor Author

@shankar-iyer I reworked the PR in the direction you suggested.

The reader-side fused rescoring prototype is gone now:

  • no custom distance kernels in MergeTreeRangeReader;
  • no VectorSearchKernel;
  • no extra simsimd usage from this PR;
  • no _distance rewrite for vector_search_with_rescoring = 1.

Current architecture:

  • vector_search_with_rescoring = 0 keeps the existing optimized _distance path.
  • vector_search_with_rescoring = 1 keeps the original SQL distance expression (L2Distance, cosineDistance, dotProduct, etc.) and keeps the vector column in the read list.
  • The vector index still returns candidate row offsets.
  • ReadFromMergeTree propagates vector-search read hints even when _distance is not requested.
  • MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch now builds the exact row-position filter from those offsets.
  • _distance is filled only if the read header requests it.
  • The regular ExpressionStep then computes exact rescoring only for the candidate rows that survived the row-position filter.

So the PR now decouples row-positioning from _distance, while preserving the existing SQL pipeline semantics for exact rescoring.

I also kept explicit _distance readable under rescoring: if _distance is selected explicitly, the vector index is asked to return distances and the virtual column is populated as before.

Validation added/updated:

  • 02354_vector_search_rescoring checks that rescoring no longer rewrites to _distance.
  • The same test has a throwIf guard proving the row-position filter is applied before the downstream ExpressionStep evaluates the exact distance expression.
  • 04109_vector_search_rescoring_distance_select covers:
    • distance expression in the select list under exact rescoring;
    • no _distance rewrite in EXPLAIN;
    • explicit _distance selected under rescoring.
  • Existing/focused coverage also includes PREWHERE, non-Float32 element types, Float64 reference precision, cosine zero-vector behavior, and dotProduct fallback.

Local verification:

  • ninja -C build_vector_pr programs/clickhouse
  • focused stateless run: 8 tests passed, 0 skipped

Local benchmark, synthetic 50k rows with dim = 384, same query shape and local release build:

Path QPS
full exact scan, vector search disabled 72.75
row-positioned exact rescoring, this PR 246.45
no-rescore _distance path 335.97

So for this benchmark the reworked exact-rescoring path is about 3.39x faster than the full exact scan baseline, while still doing exact SQL-pipeline rescoring.

@shankar-iyer

Copy link
Copy Markdown
Member

The following is a review note from Claude (posted by @shankar-iyer on its behalf).


Thanks for iterating on this. A suggestion on framing for the next revision: the smallest change that makes rescoring use the existing row-positioning filter is a much safer starting point than a new code path. The current diff is ~530 lines, but the actual behavior delta ("apply the filter the index already produces, even when _distance is not in the read list") is about 25 lines spread over two files. Starting from the minimal diff and only adding complexity that's clearly justified would be easier to review and lower-risk to land.

Concretely, the smallest sufficient change is:

  1. In LoadedMergeTreeDataPartInfoForReader::setReadHints, drop the read_columns.contains("_distance") gate so vector-search hints are stored whenever they're present.
  2. In MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch, build the row-offset filter as today, but only allocate / populate the _distance column when read_sample_block.has("_distance"). The existing offset_to_distance map lookup stays.

That's it. No new flag, no optimizer-side plumbing, no changes to filterMarksUsingIndex, no changes to how the index is asked for distances.

A few things in the current diff that are worth dropping for risk reasons:

  • The strict std::is_sortedLOGICAL_ERROR in fillDistanceColumnAndFilterForVectorSearch is a real regression risk. The rows arriving at the range reader aren't always sorted by row offset in practice — binary quantization (vector_similarity('hnsw', …, 'b1', …)) is the case that's been observed, and the issue is upstream of this code in the multi-index-granule loop in filterMarksUsingIndex. The previous local sort + offset→distance map made the consumer immune to this. Replacing it with a hard throw turns a class of currently-working queries into thrown LOGICAL_ERRORs. Keep the defensive local sort; if and when the upstream ordering bug is fixed, the strictness can be added then.

  • The switch from "local sort + offset→distance map" to "in-place sort of stored hints + positional lookup" changes the contract of RangesInDataPartReadHints::vector_search_results — consumers may now assume sorted-and-paired, whereas previously they could only assume paired in arbitrary order. That contract change is independent of the rescoring fix; if it's worth making, it deserves its own PR with its own justification. As written, it adds risk to a change that doesn't need it.

  • The use_vector_search_result_filter flag in RangesInDataPartReadHints doesn't carry information that isn't already implicit in vector_search_results.has_value() — whenever vector-search hints are present, the consumer wants to apply the row filter. Adding a flag means every producer and consumer now has to be kept consistent on that flag, for no behavioral gain. Removing it would also let the matching optimizer-side helpers (enableVectorSearchRowFilter, the read_distance_column / return_distances recomputation, the extra setVectorSearchParameters call) go away.

  • Unrelated changes mixed into the diffsort_description.size() > 1!= 1 (looks like a legitimate fix to a potential front() on an empty vector), the createLocalParallelReplicas return std::make_unique<…> → temp-variable refactor, and the return optimization_applied change in optimizeVectorSearchSecondPass. Each is defensible in isolation, but bundling them inflates the review surface and makes bisection harder if anything regresses. Better as standalone PRs.

On the benchmark: the relevant baseline is master with vector_search_with_rescoring = 1 (granule-level rescoring), not a full brute-force scan. Could the table in the PR description be re-run with that row included? The expected uplift for selective queries is in the 20–30% latency range from local measurements with the minimal patch.

And the PR title still says "Fuse vector search rescoring in MergeTree reads", which no longer reflects what the PR does — nothing is being fused. Something like "Apply exact row-positioning for vector search rescoring queries" would match, with a matching changelog entry.

As an aside, if agentic AI / coding tools are part of the workflow here, a tightly-scoped framing like "enable the existing row-positioning optimization even when vector_search_with_rescoring = 1, so the filter the index already produces is applied to rescoring queries too" tends to land close to the two-file change directly. Sharing in case it's useful — broader prompts naturally produce broader diffs.

Happy to share a sketch of the minimal patch as a separate branch if that helps as a reference point. Thanks for working on this.

@shankar-iyer

Copy link
Copy Markdown
Member

Please check above comment from Claude, I agree with that.

@skuznetsov skuznetsov changed the title Fuse vector search rescoring in MergeTree reads Apply exact row-positioning for vector search rescoring queries May 26, 2026
Address review feedback from ClickHouse#105591 (comment) and ClickHouse#105591 (comment).

Keep `vector_search_with_rescoring = 1` in the SQL pipeline and use vector-search read hints only as an exact row-position filter before `ExpressionStep`.

Fill `_distance` only when the read header requests it. Keep an explicit `use_vector_search_result_filter` signal because `vector_search_results` can be present for mark pruning without enabling row-level filtering; the existing `PREWHERE` case covers this distinction.

Remove the broad reader-side fused rescoring prototype and its extra stateless tests.
@skuznetsov

Copy link
Copy Markdown
Contributor Author

@shankar-iyer thanks, agreed on keeping the PR smaller and on not using the earlier local benchmark framing as a master-vs-PR claim.

I updated the PR title/description and pushed a scoped revision:

  • removed the reader-side fused rescoring/custom distance path from the effective diff;
  • kept vector_search_with_rescoring = 1 in the normal SQL pipeline;
  • use the vector-search second pass only to mark analyzed parts where exact row offsets are available;
  • MergeTreeRangeReader applies the existing part-offset filter even when _distance is not requested;
  • _distance is still populated only when the read header requests it.

I kept one explicit use_vector_search_result_filter signal instead of storing/applying hints unconditionally. I tested the unconditional version and it changed the explicit PREWHERE case: vector_search_results can be present for mark pruning even when row-level filtering must remain disabled. The focused stateless test now covers that distinction.

Validation:

  • ninja -C build_vector_pr programs/clickhouse
  • tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring: 3 tests passed, 0 skipped.

I removed the old local perf claim from the PR description. The relevant performance comparison should be against current master with vector_search_with_rescoring = 1, not against a full exact-scan fallback shape.

@shankar-iyer shankar-iyer added the can be tested Allows running workflows for external contributors label May 26, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [b4574a1]

Summary:

job_name test_name status info comment
Performance Comparison (amd_release, master_head, 4/6) FAIL Performance dashboard
array_reduce #0::old FAIL query history IGNORED
array_reduce #0::new FAIL query history IGNORED
array_reduce #1::old FAIL query history IGNORED
array_reduce #1::new FAIL query history IGNORED
array_reduce #2::old FAIL query history IGNORED
array_reduce #2::new FAIL query history IGNORED
array_reduce #3::old FAIL query history IGNORED
array_reduce #3::new FAIL query history IGNORED
order_with_limit #0::old FAIL query history IGNORED
order_with_limit #0::new FAIL query history IGNORED
18 more test cases not shown

AI Review

Summary

This PR narrows vector_search_with_rescoring = 1 reads to vector-index candidate rows before the exact SQL distance expression runs, updates the row-hint plumbing across ReadFromMergeTree and MergeTreeRangeReader, covers multi-sub-index candidate aggregation, and updates the setting/docs text. The current diff addresses the prior multi-granule and documentation concerns, and I did not find a remaining correctness blocker.

Missing context / blind spots
  • ⚠️ I did not run the focused vector-search stateless tests locally in this review; I relied on the current code/diff, prior discussion, author-provided focused runs, and the PR CI report fetched through the repository helper.
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label May 26, 2026
Comment thread src/Processors/QueryPlan/Optimizations/useVectorSearch.cpp
@shankar-iyer

shankar-iyer commented May 27, 2026

Copy link
Copy Markdown
Member

Stress test run with binary quantization and rescoring_multiplier = 10

[2026-05-27 01:30:57] > Start execution for [Extract truthset hackernews_openai_10m_1k.tar]
[2026-05-27 01:30:57] Run command: [tar xvf hackernews_openai_10m_1k.tar]
[2026-05-27 01:30:57] hackernews_openai_10m_1k_vectors.npy
[2026-05-27 01:30:57] hackernews_openai_10m_1k_neighbours.npy
[2026-05-27 01:30:57] hackernews_openai_10m_1k_distances.npy
[2026-05-27 01:30:57] Wed May 27 03:30:57 2026  :  Loaded truth set containing 1000 vectors
[2026-05-27 01:31:01] Wed May 27 03:31:01 2026  :  Vector indexes have loaded!
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Runtime for ANN : 155.615 seconds
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Calculating recall...
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Recall is 0.9919400000000037
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Dropping table hackernews_openai ...

Earlier run without PR -

Wed Apr 29 14:35:32 2026  :  Loaded truth set containing 1000 vectors
Wed Apr 29 14:35:36 2026  :  Vector indexes have loaded!
Wed Apr 29 14:38:52 2026  :  Runtime for ANN : 195.913 seconds
Wed Apr 29 14:38:52 2026  :  Calculating recall...
Wed Apr 29 14:38:52 2026  :  Recall is 0.9959900000000024

Optimization is cool - 195 secs -> 155 secs for 1000 queries for k=100

@skuznetsov

skuznetsov commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

@shankar-iyer Thanks for running the stress test!

This is useful because it validates the intended production-like path: binary quantization with exact rescoring. The measured runtime improvement is about 20.6% wall-clock reduction, or about 25.9% higher query throughput for the 1000-query run.

The small recall difference also makes sense to me. This PR removes the previous granule spillover behavior: instead of rescoring all rows from the granules returned by the vector index, it applies the exact row-position filter first and then lets the regular SQL expression compute the full-precision distance. That should reduce unnecessary reads and distance evaluations, but it can also remove incidental extra candidates that happened to live in the same granules. For workloads that need the previous recall level, increasing the fetch/rescoring multiplier should be the right knob.

So I would frame this as a latency/read-amplification improvement, not as a recall improvement.

SELECT 'Test with enabled rescoring';
-- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules
-- Expect only the exact rows returned by the vector index. The old granule-level
-- rescoring behavior also returned 18 and 17 because they are in the same granules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.

LIMIT 4
SETTINGS vector_search_with_rescoring = 1;
SETTINGS vector_search_with_rescoring = 1,
optimize_move_to_prewhere = 0,

@shankar-iyer shankar-iyer May 27, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need one more change. This location -

if (read_from_merge_tree_step && read_from_merge_tree_step->getVectorSearchParameters().has_value() && !settings[Setting::vector_search_with_rescoring])

We should drop the !settings[Setting::vector_search_with_rescoring], please test for side effects (hopefully none). Then we don't need query_plan_optimize_prewhere = 0 in this query and next.

Disable implicit `PREWHERE` movement for vector search lookups regardless of `vector_search_with_rescoring` so the row-positioning/rescoring path does not depend on per-query `optimize_move_to_prewhere` opt-outs.

The focused stateless test now exercises default `PREWHERE` settings for rescoring queries.

Validation: `ninja -C build_vector_pr programs/clickhouse`; focused `02354_vector_search_rescoring_and_prewhere`; vector-search stateless sweep; `git diff --check`.
@skuznetsov

Copy link
Copy Markdown
Contributor Author

Thanks, this was the right simplification. I updated the PR so optimizePrewhere now skips implicit PREWHERE movement for any vector search lookup, not only when vector_search_with_rescoring is disabled. That also lets the rescoring/prewhere regression test use the default optimize_move_to_prewhere / query_plan_optimize_prewhere settings instead of per-query opt-outs.

Validation I re-ran locally:

  • ninja -C build_vector_pr programs/clickhouse
  • focused 02354_vector_search_rescoring_and_prewhere
  • vector-search stateless sweep: 02354_vector_search_rescoring, 02354_vector_search_rescoring_distance_in_select_list, 02354_vector_search_pre_and_post_filtering, 02354_vector_search_binary_quantization, 02354_vector_search_rescoring_and_prewhere
  • git diff --check

I also did a focused local perf sanity check for this guard change only. Setup: synthetic 50k rows, dim=384, vector_similarity('hnsw', 'L2Distance', ...), 100 query iterations, max_threads = 1, expression/aggregate/sort compilation disabled, and vector_search_index_fetch_multiplier = 20 so result row counts match before/after. Both plans still materialize explicit L2Distance and use the vector_similarity skip index; _distance does not appear in the valid EXPLAIN output.

case before after delta
attr1 >= 500 QPS 73.4 153.6 +109%
attr1 >= 500 p50/p95/p99 12/13/13 ms 5/5/6 ms better
attr1 >= 900 QPS 140.1 151.4 +8%
attr1 >= 900 p50/p95/p99 5/6/7 ms 5/6/6 ms not worse

I discarded a lower-multiplier selective run from perf claims because row counts diverged (20 before vs 15 after), so the table above is the apples-to-apples run. This is not meant as a production benchmark, only a regression check for the requested PREWHERE guard change.

@shankar-iyer shankar-iyer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove explicit setting edits in the 2 test files : query_plan_optimize_lazy_materialization, compile_expressions etc.

LGTM from my side. Let's wait for feedback from @rschu1ze

Address review feedback from ClickHouse#105591 (review).

Let the vector rescoring stateless tests run with default lazy materialization, function-after-sorting, and expression compilation settings. Keep only the explicit PREWHERE toggles in the test block that intentionally exercises PREWHERE behavior.

Validation: targeted stateless run for 02354_vector_search_rescoring, 02354_vector_search_rescoring_distance_in_select_list, and 02354_vector_search_rescoring_and_prewhere passed; git diff --check.
@skuznetsov

Copy link
Copy Markdown
Contributor Author

Addressed. I removed the explicit helper setting overrides from the two vector rescoring stateless tests:

  • query_plan_optimize_lazy_materialization
  • query_plan_execute_functions_after_sorting
  • compile_expressions
  • compile_sort_description

I kept only the explicit PREWHERE on/off settings in the block that intentionally tests PREWHERE behavior.

Re-checked locally after the change:

  • 02354_vector_search_rescoring
  • 02354_vector_search_rescoring_distance_in_select_list
  • 02354_vector_search_rescoring_and_prewhere

Result: 3 tests passed, 0 skipped. Also ran git diff --check.

Avoid applying the no-rescore plan rewrite when `_distance` is already requested explicitly, because the `ReadFromMergeTree` step has already replaced the vector column in that shape.

Add stateless coverage for multi-granule vector results with explicit `_distance`, and qualify the `vector_search_with_rescoring` documentation around exact row-position filtering.

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)

@rschu1ze rschu1ze left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, this PR was somehow not on my radar.

I had a couple of comments but the PR overall looks good. Thanks.

Comment thread tests/queries/0_stateless/02354_vector_search_rescoring_and_prewhere.sql Outdated
Comment thread tests/queries/0_stateless/02354_vector_search_rescoring.sql Outdated
Comment thread tests/queries/0_stateless/02354_vector_search_rescoring.sql Outdated
Comment thread tests/queries/0_stateless/02354_vector_search_rescoring.sql Outdated
SELECT 'Test with enabled rescoring';
-- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules
-- Expect only the exact rows returned by the vector index. The old granule-level
-- rescoring behavior also returned 18 and 17 because they are in the same granules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.

Comment thread src/Storages/MergeTree/RangesInDataPart.h Outdated
Clarify `vector_search_with_rescoring` docs and test wording without using the `exact row-positioning` term in user-facing text.

Document what `use_vector_search_result_filter` means for row filtering versus mark pruning and `_distance` population.

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)

Related: ClickHouse#105591 (comment)
Update the `useVectorSearch` comment so it describes both rescoring candidate-row filtering and the non-rescoring `_distance` path accurately.
@skuznetsov

Copy link
Copy Markdown
Contributor Author

Pushed follow-up fixes on 7e227e5f4d54cc982fdcc912f5413500cea5adba.

Addressed the latest review feedback:

  • aggregated vector-search candidate rows across multiple vector index granules instead of keeping only the last sub-index result;
  • added a small-GRANULARITY 2 stateless regression so rescoring preserves candidates from multiple vector index granules;
  • added coverage for explicit _distance with and without vector_search_with_rescoring;
  • updated vector_search_with_rescoring setting text and ANN docs to describe filtering to candidate rows before final rescoring when the plan allows it;
  • adjusted test wording/comments per review: no exact row-positioning term in public/test output, no ExpressionStep mention in test text, and no old-state wording;
  • clarified the use_vector_search_result_filter comment and the vector-search/PREWHERE optimizer comment.

Local verification on the pushed branch:

  • ninja -C build_verify programs/clickhouse
  • git diff --check
  • targeted stateless run:
    • 02354_vector_search_rescoring
    • 02354_vector_search_rescoring_distance_in_select_list
    • 02354_vector_search_rescoring_and_prewhere

Result: 3 tests passed, 0 skipped.

@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 38 queries analysed

This PR changes only the vector-similarity-search rescoring path: the query-plan optimizations that handle vector indexes (PREWHERE conflict resolution, second-pass candidate-row filtering) and the MergeTree read-hint plumbing that feeds vector distances back into the reader. ClickBench queries do not use vector-similarity indexes, so none of the 10 flagged deltas can plausibly be caused by this change. The fact that many unrelated queries all shift in the same direction by ~6-18%, with only 6 source runs against 30 base runs, points to instance/build-level variance rather than a real per-query effect. All flagged queries are downgraded to not-sure.

clickbench

⚠️ 10 inconclusive

Flagged queries (10 of 43)
Query Verdict Baseline median (ms) PR median (ms) Change q-value Hint
⚠️ 9 not_sure 367 408 +11.3% <0.0001 PR only touches vector-search rescoring path; ClickBench Q9 uses no vector index, so this delta is off-path
⚠️ 13 not_sure 492 544 +10.7% <0.0001 Vector-search-only diff cannot reach Q13's execution path; uniform shift looks like instance variance
⚠️ 15 not_sure 249 222 -10.8% <0.0001 Flagged improvement but PR only changes vector index rescoring; Q15 unrelated, treat as noise
⚠️ 16 not_sure 778 895 +15.0% <0.0001 15% delta but PR touches only vector-search plan/reader; Q16 has no vector index, off-path
⚠️ 17 not_sure 541 619 +14.4% <0.0001 PR confined to vector-similarity rescoring; Q17 does not exercise it, so delta is variance
⚠️ 18 not_sure 1345 1535 +14.1% <0.0001 Vector-search-only change cannot affect Q18; noisy measurements and few source runs
⚠️ 31 not_sure 301 354 +17.6% <0.0001 PR scope is vector index rescoring; Q31 unrelated, uniform regression points to infra variance
⚠️ 32 not_sure 1327 1562 +17.7% <0.0001 High-variance query; vector-search-only diff cannot touch Q32, so this is off-path noise
⚠️ 33 not_sure 1461 1547 +5.9% <0.0001 6% delta, PR only changes vector rescoring; Q33 off-path, consistent with instance variance
⚠️ 34 not_sure 1460 1545 +5.8% <0.0001 Vector-search-only change cannot reach Q34; uniform shift across queries is instance-level variance

q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.

tpch_adapted_1_official

🟢 No significant changes

Debug info
  • StressHouse run: 00c2a0ee-7f88-479e-936d-cb3ee5ceb9d3
  • MIRAI run: 8a62a66f-9ced-42b3-86b7-5729b443c18c
  • PR check IDs:
    • clickbench_238316_1782568369
    • clickbench_238322_1782568369
    • clickbench_238333_1782568369
    • tpch_adapted_1_official_238341_1782568369
    • tpch_adapted_1_official_238371_1782568369
    • tpch_adapted_1_official_238377_1782568369

@clickhouse-gh

clickhouse-gh Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.40% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 107/116 (92.24%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@shankar-iyer

Copy link
Copy Markdown
Member

Only failure right now is in the "Performance Comparision", unrelated and is noise.

@shankar-iyer shankar-iyer enabled auto-merge June 29, 2026 05:33
@shankar-iyer shankar-iyer added this pull request to the merge queue Jun 29, 2026
Merged via the queue into ClickHouse:master with commit d562138 Jun 29, 2026
167 of 169 checks passed
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 29, 2026
@Algunenano

Copy link
Copy Markdown
Member

Ergus pushed a commit to Ergus/ClickHouse that referenced this pull request Jun 29, 2026
…es" (ClickHouse#105591)

This reverts the merge of PR ClickHouse#105591 (commit d562138), reversing changes
made to 6519f67.

ClickHouse#105591 broke 02354_vector_search_rescoring on master across all build configs
with a deterministic Code 44 ILLEGAL_COLUMN ("The _distance column is an
internal virtual column of vector search and cannot be referenced directly").

Root cause is a base-skew semantic conflict between two PRs merged 65 minutes
apart, neither of which saw the other in CI:
- ClickHouse#107985 (merged 10:05 UTC) added a hard ILLEGAL_COLUMN guard in both passes
  of useVectorSearch.cpp, making _distance internal-only. It closed customer
  incident clickhouse-core-incidents#1654 and added test
  02354_vector_search_incident1654 asserting the guard.
- ClickHouse#105591 (merged 11:10 UTC) added a feature that deliberately supports
  referencing _distance directly, with test queries and reference output that
  expect it to succeed.

The guard from ClickHouse#107985 pre-empts ClickHouse#105591's new code path, so ClickHouse#105591's own
tests fail on master. Reverting ClickHouse#105591 (the later, feature PR) restores master
to green while preserving the customer-incident ClickHouse#1654 guard. The product
decision of whether _distance should be user-referenceable is left to the
vector-search owners to reconcile and re-land.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ergus pushed a commit to Ergus/ClickHouse that referenced this pull request Jun 29, 2026
…5591-vector-rescoring-distance-conflict

Revert "Apply exact row-positioning for vector search rescoring queries" (ClickHouse#105591)
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Jun 30, 2026
Pick up revert e531152 of PR ClickHouse#105591 (vector search rescoring exact
row-positioning) which caused 02354_vector_search_rescoring to fail with
ILLEGAL_COLUMN on _distance. Our branch was rebased onto the pre-revert
master pin d562138.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants