iframe-proxy

skuznetsov · 2026-05-21T22:33:11Z

This implements exact row-positioning for vector_search_with_rescoring = 1, matching the roadmap direction in #104122 and #103745.

Today rescoring can still carry the vector column downstream and evaluate the exact distance expression for all rows in selected granules. With this change, the optimizer keeps the original SQL distance expression, propagates vector-search read hints with an explicit row-filter signal, and MergeTreeRangeReader applies the exact part-offset filter before the downstream ExpressionStep. _distance is still populated only when the read header requests it, so the existing no-rescoring _distance path remains separate.

The explicit use_vector_search_result_filter flag keeps mark-pruning hints separate from row-level filtering. This avoids changing plans where vector_search_results exist but the row filter is intentionally disabled, such as the explicit PREWHERE case covered by the focused stateless test.

Local checks:

ninja -C build_vector_pr programs/clickhouse
tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring: 3 tests passed, 0 skipped.
tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring_and_prewhere --test-runs 5: 5 tests passed, 0 skipped.

Local focused benchmark:

baseline: PR merge-base master at 3b7724bb303
PR build: ef9fa9de902; latest pushed commit 5a30362c5a8 is test-only
dataset: 50k rows, Array(Float32), dim = 384
index: vector_similarity('hnsw', 'L2Distance', 384, 'f32', 16, 64) GRANULARITY 100000000
query shape: ORDER BY L2Distance(vec, ref) LIMIT 20
settings include vector_search_with_rescoring = 1, max_threads = 1, query_plan_optimize_lazy_materialization = 0, query_plan_execute_functions_after_sorting = 0
each run: 30 warmup queries, then 100 measured queries

Both plans use the vector index (Skip / vector_similarity, Granules: 7/8) and keep exact rescoring in the SQL pipeline (ReadFromMergeTree reads id UInt32 and vec Array(Float32), no _distance rewrite for vector_search_with_rescoring = 1).

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Apply exact row-positioning for vector search queries with vector_search_with_rescoring before exact distance evaluation in the query pipeline.

Documentation entry for user-facing changes

Documentation is not required.

Version info

Merged into: 26.7.1.238

Apply the fused `_distance` rewrite only when `vector_search_with_rescoring` can preserve regular reranking semantics. Unsupported cases such as `dotProduct`, non-`Array(Float32)` vectors, and non-`Float32` distance outputs keep the existing `ExpressionStep` rerank path. Also keep scanning after fallback-only matches in the second `useVectorSearch` pass.

Keep row offsets alive after clearing read hints on no-distance vector index results. Preserve `fused_rescore_vector_column` when cloning `ReadFromMergeTree` steps so unused-column pruning still keeps raw vectors for fused rescoring. Replace debug-only read hint shape assertions with release `LOGICAL_ERROR`s.

skuznetsov · 2026-05-21T22:48:07Z

@alexey-milovidov @abashkeev @nikitamikhaylov, this PR is for your attention

shankar-iyer · 2026-05-25T04:55:25Z

Thanks for the contribution! This PR logically matches one of the items in our roadmap -

#104122
Apply exact row positioning optimization when rescoring with quantized vector indexes - the vector column will still be read, but rescoring distance calculation will only be executed on specific neighbour rows rather than all rows in the granule. Currently the entire optimization is disabled if rescoring is ON. Tracked here : #103745

and

#103745 (comment)

My idea was to just enable the row-positioning optimization even when rescoring is ON (and drop the _distance column). That would require a small change in MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch() and maybe also in useVectorSearch.cpp. As I wrote, currently we rescore with all vectors in a granule - we really only need to rescore with one (or handful) of vector rows in the granule that was returned by usearch. And this rescoring will happen naturally in the SQL pipeline in the ExpressionStep evaluation. Can you please explain why have you re-implemnented rescoring and added custom distance functions in MergeTreeRangeReader itself?

skuznetsov · 2026-05-25T12:54:14Z

@shankar-iyer thanks, this makes sense.

I originally extended the existing _distance rewrite because it was the shortest path to proving that exact row positioning helps vector_search_with_rescoring = 1: keep the vector column, compute full-precision distance in the reader, and wire the sort expression to _distance _.

But I agree that the cleaner ClickHouse-native version is to decouple row-positioning from _distance:

for vector_search_with_rescoring = 0, keep the existing _distance rewrite;
for vector_search_with_rescoring = 1, propagate vector-search read hints even when _distance is not requested;
in MergeTreeRangeReader, build/apply the row-offset filter from the hints, but fill _distance only if the read header requests it;
leave the original L2Distance / cosineDistance / dotProduct expression in the SQL pipeline, so it is evaluated only after row-position filtering.

That should preserve the existing function semantics and avoid custom distance implementations in the reader. I’ll rework the PR in this direction.

Keep `vector_search_with_rescoring = 1` on the regular `ExpressionStep` distance path, but preserve vector-index row hints and let `MergeTreeRangeReader` filter to exact candidate rows before rescoring. Continue filling `_distance` only when it is requested; explicit `_distance` reads still ask `USearch` for distances. This removes the reader-side fused rescore prototype and related custom kernels, matching the maintainer-requested row-positioning architecture from ClickHouse#105591 (comment). Tests: - `ninja -C build_vector_pr programs/clickhouse` - `tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring 04105_vector_search_rescoring_non_float32_elements 04106_vector_search_rescoring_cosine_zero 04108_vector_search_rescoring_float64_reference_precision 04109_vector_search_rescoring_distance_select 04211_vector_search_rescoring_dot_product_fallback` (8 passed, 0 skipped) - Local benchmark on 50k x dim384: row-positioned exact rescoring 246 QPS vs full exact scan 72.8 QPS; no-rescore `_distance` path 336 QPS

Keep vector search row hints by reference while building mark ranges and remove unused includes left from the previous rescoring prototype. Build: ninja -C build_vector_pr programs/clickhouse

skuznetsov · 2026-05-25T15:34:09Z

@shankar-iyer I reworked the PR in the direction you suggested.

The reader-side fused rescoring prototype is gone now:

no custom distance kernels in MergeTreeRangeReader;
no VectorSearchKernel;
no extra simsimd usage from this PR;
no _distance rewrite for vector_search_with_rescoring = 1.

Current architecture:

vector_search_with_rescoring = 0 keeps the existing optimized _distance path.
vector_search_with_rescoring = 1 keeps the original SQL distance expression (L2Distance, cosineDistance, dotProduct, etc.) and keeps the vector column in the read list.
The vector index still returns candidate row offsets.
ReadFromMergeTree propagates vector-search read hints even when _distance is not requested.
MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch now builds the exact row-position filter from those offsets.
_distance is filled only if the read header requests it.
The regular ExpressionStep then computes exact rescoring only for the candidate rows that survived the row-position filter.

So the PR now decouples row-positioning from _distance, while preserving the existing SQL pipeline semantics for exact rescoring.

I also kept explicit _distance readable under rescoring: if _distance is selected explicitly, the vector index is asked to return distances and the virtual column is populated as before.

Validation added/updated:

02354_vector_search_rescoring checks that rescoring no longer rewrites to _distance.
The same test has a throwIf guard proving the row-position filter is applied before the downstream ExpressionStep evaluates the exact distance expression.
04109_vector_search_rescoring_distance_select covers:
- distance expression in the select list under exact rescoring;
- no _distance rewrite in EXPLAIN;
- explicit _distance selected under rescoring.
Existing/focused coverage also includes PREWHERE, non-Float32 element types, Float64 reference precision, cosine zero-vector behavior, and dotProduct fallback.

Local verification:

ninja -C build_vector_pr programs/clickhouse
focused stateless run: 8 tests passed, 0 skipped

Local benchmark, synthetic 50k rows with dim = 384, same query shape and local release build:

Path	QPS
full exact scan, vector search disabled	`72.75`
row-positioned exact rescoring, this PR	`246.45`
no-rescore `_distance` path	`335.97`

So for this benchmark the reworked exact-rescoring path is about 3.39x faster than the full exact scan baseline, while still doing exact SQL-pipeline rescoring.

shankar-iyer · 2026-05-26T04:11:37Z

The following is a review note from Claude (posted by @shankar-iyer on its behalf).

Thanks for iterating on this. A suggestion on framing for the next revision: the smallest change that makes rescoring use the existing row-positioning filter is a much safer starting point than a new code path. The current diff is ~530 lines, but the actual behavior delta ("apply the filter the index already produces, even when _distance is not in the read list") is about 25 lines spread over two files. Starting from the minimal diff and only adding complexity that's clearly justified would be easier to review and lower-risk to land.

Concretely, the smallest sufficient change is:

In LoadedMergeTreeDataPartInfoForReader::setReadHints, drop the read_columns.contains("_distance") gate so vector-search hints are stored whenever they're present.
In MergeTreeRangeReader::fillDistanceColumnAndFilterForVectorSearch, build the row-offset filter as today, but only allocate / populate the _distance column when read_sample_block.has("_distance"). The existing offset_to_distance map lookup stays.

That's it. No new flag, no optimizer-side plumbing, no changes to filterMarksUsingIndex, no changes to how the index is asked for distances.

A few things in the current diff that are worth dropping for risk reasons:

The strict std::is_sorted → LOGICAL_ERROR in fillDistanceColumnAndFilterForVectorSearch is a real regression risk. The rows arriving at the range reader aren't always sorted by row offset in practice — binary quantization (vector_similarity('hnsw', …, 'b1', …)) is the case that's been observed, and the issue is upstream of this code in the multi-index-granule loop in filterMarksUsingIndex. The previous local sort + offset→distance map made the consumer immune to this. Replacing it with a hard throw turns a class of currently-working queries into thrown LOGICAL_ERRORs. Keep the defensive local sort; if and when the upstream ordering bug is fixed, the strictness can be added then.
The switch from "local sort + offset→distance map" to "in-place sort of stored hints + positional lookup" changes the contract of RangesInDataPartReadHints::vector_search_results — consumers may now assume sorted-and-paired, whereas previously they could only assume paired in arbitrary order. That contract change is independent of the rescoring fix; if it's worth making, it deserves its own PR with its own justification. As written, it adds risk to a change that doesn't need it.
The use_vector_search_result_filter flag in RangesInDataPartReadHints doesn't carry information that isn't already implicit in vector_search_results.has_value() — whenever vector-search hints are present, the consumer wants to apply the row filter. Adding a flag means every producer and consumer now has to be kept consistent on that flag, for no behavioral gain. Removing it would also let the matching optimizer-side helpers (enableVectorSearchRowFilter, the read_distance_column / return_distances recomputation, the extra setVectorSearchParameters call) go away.
Unrelated changes mixed into the diff — sort_description.size() > 1 → != 1 (looks like a legitimate fix to a potential front() on an empty vector), the createLocalParallelReplicas return std::make_unique<…> → temp-variable refactor, and the return optimization_applied change in optimizeVectorSearchSecondPass. Each is defensible in isolation, but bundling them inflates the review surface and makes bisection harder if anything regresses. Better as standalone PRs.

On the benchmark: the relevant baseline is master with vector_search_with_rescoring = 1 (granule-level rescoring), not a full brute-force scan. Could the table in the PR description be re-run with that row included? The expected uplift for selective queries is in the 20–30% latency range from local measurements with the minimal patch.

And the PR title still says "Fuse vector search rescoring in MergeTree reads", which no longer reflects what the PR does — nothing is being fused. Something like "Apply exact row-positioning for vector search rescoring queries" would match, with a matching changelog entry.

As an aside, if agentic AI / coding tools are part of the workflow here, a tightly-scoped framing like "enable the existing row-positioning optimization even when vector_search_with_rescoring = 1, so the filter the index already produces is applied to rescoring queries too" tends to land close to the two-file change directly. Sharing in case it's useful — broader prompts naturally produce broader diffs.

Happy to share a sketch of the minimal patch as a separate branch if that helps as a reference point. Thanks for working on this.

shankar-iyer · 2026-05-26T04:12:54Z

Please check above comment from Claude, I agree with that.

Address review feedback from ClickHouse#105591 (comment) and ClickHouse#105591 (comment). Keep `vector_search_with_rescoring = 1` in the SQL pipeline and use vector-search read hints only as an exact row-position filter before `ExpressionStep`. Fill `_distance` only when the read header requests it. Keep an explicit `use_vector_search_result_filter` signal because `vector_search_results` can be present for mark pruning without enabling row-level filtering; the existing `PREWHERE` case covers this distinction. Remove the broad reader-side fused rescoring prototype and its extra stateless tests.

skuznetsov · 2026-05-26T12:51:08Z

@shankar-iyer thanks, agreed on keeping the PR smaller and on not using the earlier local benchmark framing as a master-vs-PR claim.

I updated the PR title/description and pushed a scoped revision:

removed the reader-side fused rescoring/custom distance path from the effective diff;
kept vector_search_with_rescoring = 1 in the normal SQL pipeline;
use the vector-search second pass only to mark analyzed parts where exact row offsets are available;
MergeTreeRangeReader applies the existing part-offset filter even when _distance is not requested;
_distance is still populated only when the read header requests it.

I kept one explicit use_vector_search_result_filter signal instead of storing/applying hints unconditionally. I tested the unconditional version and it changed the explicit PREWHERE case: vector_search_results can be present for mark pruning even when row-level filtering must remain disabled. The focused stateless test now covers that distinction.

Validation:

ninja -C build_vector_pr programs/clickhouse
tests/clickhouse-test -b build_vector_pr/programs/clickhouse --queries tests/queries --no-stateful --no-random-settings --no-random-merge-tree-settings --no-zookeeper --no-shard --timeout 180 02354_vector_search_rescoring: 3 tests passed, 0 skipped.

I removed the old local perf claim from the PR description. The relevant performance comparison should be against current master with vector_search_with_rescoring = 1, not against a full exact-scan fallback shape.

clickhouse-gh · 2026-05-26T16:26:44Z

Workflow [PR], commit [b4574a1]

Summary: ❌

Performance Comparison (amd_release, master_head, 4/6): Performance dashboard

job_name	test_name	status	info	comment
Performance Comparison (amd_release, master_head, 4/6)		FAIL	Performance dashboard
	array_reduce #0::old	FAIL	query history	IGNORED
	array_reduce #0::new	FAIL	query history	IGNORED
	array_reduce #1::old	FAIL	query history	IGNORED
	array_reduce #1::new	FAIL	query history	IGNORED
	array_reduce #2::old	FAIL	query history	IGNORED
	array_reduce #2::new	FAIL	query history	IGNORED
	array_reduce #3::old	FAIL	query history	IGNORED
	array_reduce #3::new	FAIL	query history	IGNORED
	order_with_limit #0::old	FAIL	query history	IGNORED
	order_with_limit #0::new	FAIL	query history	IGNORED
	18 more test cases not shown

AI Review

Summary

This PR narrows vector_search_with_rescoring = 1 reads to vector-index candidate rows before the exact SQL distance expression runs, updates the row-hint plumbing across ReadFromMergeTree and MergeTreeRangeReader, covers multi-sub-index candidate aggregation, and updates the setting/docs text. The current diff addresses the prior multi-granule and documentation concerns, and I did not find a remaining correctness blocker.

Missing context / blind spots

⚠️ I did not run the focused vector-search stateless tests locally in this review; I relied on the current code/diff, prior discussion, author-provided focused runs, and the PR CI report fetched through the repository helper.

Final Verdict

Status: ✅ Approve

shankar-iyer · 2026-05-27T02:18:29Z

Stress test run with binary quantization and rescoring_multiplier = 10

[2026-05-27 01:30:57] > Start execution for [Extract truthset hackernews_openai_10m_1k.tar]
[2026-05-27 01:30:57] Run command: [tar xvf hackernews_openai_10m_1k.tar]
[2026-05-27 01:30:57] hackernews_openai_10m_1k_vectors.npy
[2026-05-27 01:30:57] hackernews_openai_10m_1k_neighbours.npy
[2026-05-27 01:30:57] hackernews_openai_10m_1k_distances.npy
[2026-05-27 01:30:57] Wed May 27 03:30:57 2026  :  Loaded truth set containing 1000 vectors
[2026-05-27 01:31:01] Wed May 27 03:31:01 2026  :  Vector indexes have loaded!
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Runtime for ANN : 155.615 seconds
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Calculating recall...
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Recall is 0.9919400000000037
[2026-05-27 01:33:37] Wed May 27 03:33:37 2026  :  Dropping table hackernews_openai ...

Earlier run without PR -

Wed Apr 29 14:35:32 2026  :  Loaded truth set containing 1000 vectors
Wed Apr 29 14:35:36 2026  :  Vector indexes have loaded!
Wed Apr 29 14:38:52 2026  :  Runtime for ANN : 195.913 seconds
Wed Apr 29 14:38:52 2026  :  Calculating recall...
Wed Apr 29 14:38:52 2026  :  Recall is 0.9959900000000024

Optimization is cool - 195 secs -> 155 secs for 1000 queries for k=100

skuznetsov · 2026-05-27T02:26:17Z

@shankar-iyer Thanks for running the stress test!

This is useful because it validates the intended production-like path: binary quantization with exact rescoring. The measured runtime improvement is about 20.6% wall-clock reduction, or about 25.9% higher query throughput for the 1000-query run.

The small recall difference also makes sense to me. This PR removes the previous granule spillover behavior: instead of rescoring all rows from the granules returned by the vector index, it applies the exact row-position filter first and then lets the regular SQL expression compute the full-precision distance. That should reduce unnecessary reads and distance evaluations, but it can also remove incidental extra candidates that happened to live in the same granules. For workloads that need the previous recall level, increasing the fetch/rescoring multiplier should be the right knob.

So I would frame this as a latency/read-amplification improvement, not as a recall improvement.

shankar-iyer · 2026-05-27T13:02:44Z

 SELECT 'Test with enabled rescoring';
-- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules
+-- Expect only the exact rows returned by the vector index. The old granule-level
+-- rescoring behavior also returned 18 and 17 because they are in the same granules.


Good catch.

No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.

shankar-iyer · 2026-05-27T13:05:30Z

 LIMIT 4
-SETTINGS vector_search_with_rescoring = 1;
+SETTINGS vector_search_with_rescoring = 1,
+         optimize_move_to_prewhere = 0,


I think we need one more change. This location -

ClickHouse/src/Processors/QueryPlan/Optimizations/optimizePrewhere.cpp

Line 176 in 4c4e1b4

if (read_from_merge_tree_step && read_from_merge_tree_step->getVectorSearchParameters().has_value() && !settings[Setting::vector_search_with_rescoring])

We should drop the !settings[Setting::vector_search_with_rescoring], please test for side effects (hopefully none). Then we don't need query_plan_optimize_prewhere = 0 in this query and next.

Disable implicit `PREWHERE` movement for vector search lookups regardless of `vector_search_with_rescoring` so the row-positioning/rescoring path does not depend on per-query `optimize_move_to_prewhere` opt-outs. The focused stateless test now exercises default `PREWHERE` settings for rescoring queries. Validation: `ninja -C build_vector_pr programs/clickhouse`; focused `02354_vector_search_rescoring_and_prewhere`; vector-search stateless sweep; `git diff --check`.

skuznetsov · 2026-05-27T15:21:10Z

Thanks, this was the right simplification. I updated the PR so optimizePrewhere now skips implicit PREWHERE movement for any vector search lookup, not only when vector_search_with_rescoring is disabled. That also lets the rescoring/prewhere regression test use the default optimize_move_to_prewhere / query_plan_optimize_prewhere settings instead of per-query opt-outs.

Validation I re-ran locally:

ninja -C build_vector_pr programs/clickhouse
focused 02354_vector_search_rescoring_and_prewhere
vector-search stateless sweep: 02354_vector_search_rescoring, 02354_vector_search_rescoring_distance_in_select_list, 02354_vector_search_pre_and_post_filtering, 02354_vector_search_binary_quantization, 02354_vector_search_rescoring_and_prewhere
git diff --check

I also did a focused local perf sanity check for this guard change only. Setup: synthetic 50k rows, dim=384, vector_similarity('hnsw', 'L2Distance', ...), 100 query iterations, max_threads = 1, expression/aggregate/sort compilation disabled, and vector_search_index_fetch_multiplier = 20 so result row counts match before/after. Both plans still materialize explicit L2Distance and use the vector_similarity skip index; _distance does not appear in the valid EXPLAIN output.

case	before	after	delta
`attr1 >= 500` QPS	73.4	153.6	+109%
`attr1 >= 500` p50/p95/p99	12/13/13 ms	5/5/6 ms	better
`attr1 >= 900` QPS	140.1	151.4	+8%
`attr1 >= 900` p50/p95/p99	5/6/7 ms	5/6/6 ms	not worse

I discarded a lower-multiplier selective run from perf claims because row counts diverged (20 before vs 15 after), so the table above is the apples-to-apples run. This is not meant as a production benchmark, only a regression check for the requested PREWHERE guard change.

shankar-iyer

Please remove explicit setting edits in the 2 test files : query_plan_optimize_lazy_materialization, compile_expressions etc.

LGTM from my side. Let's wait for feedback from @rschu1ze

Address review feedback from ClickHouse#105591 (review). Let the vector rescoring stateless tests run with default lazy materialization, function-after-sorting, and expression compilation settings. Keep only the explicit PREWHERE toggles in the test block that intentionally exercises PREWHERE behavior. Validation: targeted stateless run for 02354_vector_search_rescoring, 02354_vector_search_rescoring_distance_in_select_list, and 02354_vector_search_rescoring_and_prewhere passed; git diff --check.

skuznetsov · 2026-05-27T20:12:56Z

Addressed. I removed the explicit helper setting overrides from the two vector rescoring stateless tests:

query_plan_optimize_lazy_materialization
query_plan_execute_functions_after_sorting
compile_expressions
compile_sort_description

I kept only the explicit PREWHERE on/off settings in the block that intentionally tests PREWHERE behavior.

Re-checked locally after the change:

02354_vector_search_rescoring
02354_vector_search_rescoring_distance_in_select_list
02354_vector_search_rescoring_and_prewhere

Result: 3 tests passed, 0 skipped. Also ran git diff --check.

Avoid applying the no-rescore plan rewrite when `_distance` is already requested explicitly, because the `ReadFromMergeTree` step has already replaced the vector column in that shape. Add stateless coverage for multi-granule vector results with explicit `_distance`, and qualify the `vector_search_with_rescoring` documentation around exact row-position filtering. Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment)

rschu1ze

Sorry for the late review, this PR was somehow not on my radar.

I had a couple of comments but the PR overall looks good. Thanks.

rschu1ze · 2026-06-24T23:04:09Z

 SELECT 'Test with enabled rescoring';
-- Expect 16 & 19, and additionally 18 and 17 because they are in the same granules
+-- Expect only the exact rows returned by the vector index. The old granule-level
+-- rescoring behavior also returned 18 and 17 because they are in the same granules.


No need to let the comment mention how things worked in an old Git state. Let's change l. 69/70 to Expect 16 & 19.

Clarify `vector_search_with_rescoring` docs and test wording without using the `exact row-positioning` term in user-facing text. Document what `use_vector_search_result_filter` means for row filtering versus mark pruning and `_distance` population. Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment) Related: ClickHouse#105591 (comment)

Update the `useVectorSearch` comment so it describes both rescoring candidate-row filtering and the non-rescoring `_distance` path accurately.

skuznetsov · 2026-06-25T01:05:20Z

Pushed follow-up fixes on 7e227e5f4d54cc982fdcc912f5413500cea5adba.

Addressed the latest review feedback:

aggregated vector-search candidate rows across multiple vector index granules instead of keeping only the last sub-index result;
added a small-GRANULARITY 2 stateless regression so rescoring preserves candidates from multiple vector index granules;
added coverage for explicit _distance with and without vector_search_with_rescoring;
updated vector_search_with_rescoring setting text and ANN docs to describe filtering to candidate rows before final rescoring when the plan allows it;
adjusted test wording/comments per review: no exact row-positioning term in public/test output, no ExpressionStep mention in test text, and no old-state wording;
clarified the use_vector_search_result_filter comment and the vector-search/PREWHERE optimizer comment.

Local verification on the pushed branch:

ninja -C build_verify programs/clickhouse
git diff --check
targeted stateless run:
- 02354_vector_search_rescoring
- 02354_vector_search_rescoring_distance_in_select_list
- 02354_vector_search_rescoring_and_prewhere

Result: 3 tests passed, 0 skipped.

clickhouse-gh · 2026-06-25T09:24:25Z

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 38 queries analysed

This PR changes only the vector-similarity-search rescoring path: the query-plan optimizations that handle vector indexes (PREWHERE conflict resolution, second-pass candidate-row filtering) and the MergeTree read-hint plumbing that feeds vector distances back into the reader. ClickBench queries do not use vector-similarity indexes, so none of the 10 flagged deltas can plausibly be caused by this change. The fact that many unrelated queries all shift in the same direction by ~6-18%, with only 6 source runs against 30 base runs, points to instance/build-level variance rather than a real per-query effect. All flagged queries are downgraded to not-sure.

clickbench

⚠️ 10 inconclusive

Flagged queries (10 of 43)

	Query	Verdict	Baseline median (ms)	PR median (ms)	Change	q-value	Hint
⚠️	9	not_sure	367	408	+11.3%	<0.0001	PR only touches vector-search rescoring path; ClickBench Q9 uses no vector index, so this delta is off-path
⚠️	13	not_sure	492	544	+10.7%	<0.0001	Vector-search-only diff cannot reach Q13's execution path; uniform shift looks like instance variance
⚠️	15	not_sure	249	222	-10.8%	<0.0001	Flagged improvement but PR only changes vector index rescoring; Q15 unrelated, treat as noise
⚠️	16	not_sure	778	895	+15.0%	<0.0001	15% delta but PR touches only vector-search plan/reader; Q16 has no vector index, off-path
⚠️	17	not_sure	541	619	+14.4%	<0.0001	PR confined to vector-similarity rescoring; Q17 does not exercise it, so delta is variance
⚠️	18	not_sure	1345	1535	+14.1%	<0.0001	Vector-search-only change cannot affect Q18; noisy measurements and few source runs
⚠️	31	not_sure	301	354	+17.6%	<0.0001	PR scope is vector index rescoring; Q31 unrelated, uniform regression points to infra variance
⚠️	32	not_sure	1327	1562	+17.7%	<0.0001	High-variance query; vector-search-only diff cannot touch Q32, so this is off-path noise
⚠️	33	not_sure	1461	1547	+5.9%	<0.0001	6% delta, PR only changes vector rescoring; Q33 off-path, consistent with instance variance
⚠️	34	not_sure	1460	1545	+5.8%	<0.0001	Vector-search-only change cannot reach Q34; uniform shift across queries is instance-level variance

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

tpch_adapted_1_official

🟢 No significant changes

Debug info

StressHouse run: 00c2a0ee-7f88-479e-936d-cb3ee5ceb9d3
MIRAI run: 8a62a66f-9ced-42b3-86b7-5729b443c18c
PR check IDs:
- clickbench_238316_1782568369
- clickbench_238322_1782568369
- clickbench_238333_1782568369
- tpch_adapted_1_official_238341_1782568369
- tpch_adapted_1_official_238371_1782568369
- tpch_adapted_1_official_238377_1782568369

…coring-fastpath

clickhouse-gh · 2026-06-27T05:35:10Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.40%	85.40%	+0.00%
Functions	92.60%	92.60%	+0.00%
Branches	77.60%	77.60%	+0.00%

Changed lines: Changed C/C++ lines covered by tests: 107/116 (92.24%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

shankar-iyer · 2026-06-29T05:33:42Z

Only failure right now is in the "Performance Comparision", unrelated and is noise.

Algunenano · 2026-06-29T13:25:09Z

…es" (ClickHouse#105591) This reverts the merge of PR ClickHouse#105591 (commit d562138), reversing changes made to 6519f67. ClickHouse#105591 broke 02354_vector_search_rescoring on master across all build configs with a deterministic Code 44 ILLEGAL_COLUMN ("The _distance column is an internal virtual column of vector search and cannot be referenced directly"). Root cause is a base-skew semantic conflict between two PRs merged 65 minutes apart, neither of which saw the other in CI: - ClickHouse#107985 (merged 10:05 UTC) added a hard ILLEGAL_COLUMN guard in both passes of useVectorSearch.cpp, making _distance internal-only. It closed customer incident clickhouse-core-incidents#1654 and added test 02354_vector_search_incident1654 asserting the guard. - ClickHouse#105591 (merged 11:10 UTC) added a feature that deliberately supports referencing _distance directly, with test queries and reference output that expect it to succeed. The guard from ClickHouse#107985 pre-empts ClickHouse#105591's new code path, so ClickHouse#105591's own tests fail on master. Reverting ClickHouse#105591 (the later, feature PR) restores master to green while preserving the customer-incident ClickHouse#1654 guard. The product decision of whether _distance should be user-referenceable is left to the vector-search owners to reconcile and re-land. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…5591-vector-rescoring-distance-conflict Revert "Apply exact row-positioning for vector search rescoring queries" (ClickHouse#105591)

Pick up revert e531152 of PR ClickHouse#105591 (vector search rescoring exact row-positioning) which caused 02354_vector_search_rescoring to fail with ILLEGAL_COLUMN on _distance. Our branch was rebased onto the pre-revert master pin d562138.

skuznetsov added 3 commits May 21, 2026 17:14

vector search: use simsimd for fused rescoring

73b97f6

rschu1ze self-assigned this May 22, 2026

skuznetsov added 2 commits May 25, 2026 09:23

vector search: avoid candidate row copy

0b9de96

Keep vector search row hints by reference while building mark ranges and remove unused includes left from the previous rescoring prototype. Build: ninja -C build_vector_pr programs/clickhouse

skuznetsov changed the title ~~Fuse vector search rescoring in MergeTree reads~~ Apply exact row-positioning for vector search rescoring queries May 26, 2026

shankar-iyer added the can be tested Allows running workflows for external contributors label May 26, 2026

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label May 26, 2026

clickhouse-gh Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/useVectorSearch.cpp

tests: stabilize vector search rescoring prewhere

5a30362

shankar-iyer reviewed May 27, 2026

View reviewed changes

shankar-iyer approved these changes May 27, 2026

View reviewed changes

Merge branch 'master' into codex/ch-vector-rescoring-fastpath

39e0852

rschu1ze reviewed Jun 24, 2026

View reviewed changes

skuznetsov added 2 commits June 24, 2026 20:39

Clarify vector search prewhere comment

7e227e5

Update the `useVectorSearch` comment so it describes both rescoring candidate-row filtering and the non-rescoring `_distance` path accurately.

Merge remote-tracking branch 'origin/master' into codex/ch-vector-res…

b4574a1

…coring-fastpath

shankar-iyer enabled auto-merge June 29, 2026 05:33

shankar-iyer added this pull request to the merge queue Jun 29, 2026

Merged via the queue into ClickHouse:master with commit d562138 Jun 29, 2026
167 of 169 checks passed

robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 29, 2026

groeneai mentioned this pull request Jun 29, 2026

Revert "Apply exact row-positioning for vector search rescoring queries" (#105591) #108812

Merged

This was referenced Jun 29, 2026

Fix server crash on lazy remote source with unbuilt materialized CTE #108547

Merged

Add allow_lossy_numeric_supertype setting for numeric Variant common types #107236

Open

skuznetsov mentioned this pull request Jun 29, 2026

Re-land exact row filtering for vector search rescoring queries #108846

Open

This was referenced Jun 30, 2026

Fix signed integer overflow in RefreshTask attempt log_comment #108005

Merged

Fix wedged DELETE/merge on attached table with legacy hypothesis index #108217

Merged

This was referenced Jun 30, 2026

Increase restart timeout in test_keeper_snapshot_chunked_transfer recover tests #108557

Merged

Fix Virtual row with union produces wrong result #107053

Open

metric	merge-base `master`	this PR	delta
QPS	`18.1505`	`21.689`	`+19.5%`
p50	`0.053s`	`0.0445s`	`-16.0%`
p95	`0.056s`	`0.0465s`	`-17.0%`
p99	`0.060s`	`0.049s`	`-18.3%`

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

skuznetsov commented May 21, 2026 • edited by robot-ch-test-poll2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

skuznetsov commented May 21, 2026

Uh oh!

shankar-iyer commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skuznetsov commented May 25, 2026

Uh oh!

skuznetsov commented May 25, 2026

Uh oh!

shankar-iyer commented May 26, 2026

Uh oh!

shankar-iyer commented May 26, 2026

Uh oh!

skuznetsov commented May 26, 2026

Uh oh!

clickhouse-gh Bot commented May 26, 2026 • edited by shankar-iyer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Final Verdict

Uh oh!

Uh oh!

shankar-iyer commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skuznetsov commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shankar-iyer May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rschu1ze Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

shankar-iyer May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skuznetsov commented May 27, 2026

Uh oh!

shankar-iyer left a comment

Choose a reason for hiding this comment

Uh oh!

skuznetsov commented May 27, 2026

Uh oh!

rschu1ze left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rschu1ze Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skuznetsov commented Jun 25, 2026

Uh oh!

clickhouse-gh Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

clickbench

tpch_adapted_1_official

Uh oh!

clickhouse-gh Bot commented Jun 27, 2026

LLVM Coverage Report

Uh oh!

shankar-iyer commented Jun 29, 2026

Uh oh!

skuznetsov commented May 21, 2026 •

edited by robot-ch-test-poll2

Loading

shankar-iyer commented May 25, 2026 •

edited

Loading

clickhouse-gh Bot commented May 26, 2026 •

edited by shankar-iyer

Loading

shankar-iyer commented May 27, 2026 •

edited

Loading

skuznetsov commented May 27, 2026 •

edited

Loading

shankar-iyer May 27, 2026 •

edited

Loading

clickhouse-gh Bot commented Jun 25, 2026 •

edited

Loading