Revert "Apply exact row-positioning for vector search rescoring queries" (#105591)#108812
Conversation
…es" (ClickHouse#105591) This reverts the merge of PR ClickHouse#105591 (commit d562138), reversing changes made to 6519f67. ClickHouse#105591 broke 02354_vector_search_rescoring on master across all build configs with a deterministic Code 44 ILLEGAL_COLUMN ("The _distance column is an internal virtual column of vector search and cannot be referenced directly"). Root cause is a base-skew semantic conflict between two PRs merged 65 minutes apart, neither of which saw the other in CI: - ClickHouse#107985 (merged 10:05 UTC) added a hard ILLEGAL_COLUMN guard in both passes of useVectorSearch.cpp, making _distance internal-only. It closed customer incident clickhouse-core-incidents#1654 and added test 02354_vector_search_incident1654 asserting the guard. - ClickHouse#105591 (merged 11:10 UTC) added a feature that deliberately supports referencing _distance directly, with test queries and reference output that expect it to succeed. The guard from ClickHouse#107985 pre-empts ClickHouse#105591's new code path, so ClickHouse#105591's own tests fail on master. Reverting ClickHouse#105591 (the later, feature PR) restores master to green while preserving the customer-incident ClickHouse#1654 guard. The product decision of whether _distance should be user-referenceable is left to the vector-search owners to reconcile and re-land. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @shankar-iyer @rschu1ze: master is currently red on [Edit: corrected the guard PR from #107985 to #108423 per Algunenano below; #107985 ("Add ProfileEvents to observe distributed query plan execution") is unrelated.] |
|
Workflow [PR], commit [e531152] AI ReviewSummaryThis PR cleanly reverts Final VerdictStatus: ✅ Approve |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Description
Reverts the merge of #105591 (commit
d56213874eb), reversing changes made to6519f67a9eb.#105591 broke
02354_vector_search_rescoringon master across all build configs (amd_debug, amd_asan_ubsan, amd_tsan, amd_llvm_coverage, arm_binary) with a deterministicCode: 44. ILLEGAL_COLUMN("The_distancecolumn is an internal virtual column of vector search and cannot be referenced directly in queries"). CI randomized-settings diagnosis: "Runs: 350, Failed: 350, Passed: 0. Test also fails without randomized settings (not a randomization issue)."Report (amd_debug, parallel): https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=d56213874eb9a8630fb1dd4862525efcf564d30f&name_0=MasterCI&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29
Root cause is a base-skew semantic conflict: #105591 was developed and CI-tested on a base that predated the
_distanceguard, then merged onto a master that already contained it.ILLEGAL_COLUMNwhen_distanceis selected directly in vector search queries #108423 (merged7625ed4f8f2@ 2026-06-28 02:32 UTC) added a hardILLEGAL_COLUMNguard in both passes ofuseVectorSearch.cpp(and inReadFromMergeTree.cpp), making_distancean internal-only virtual column. It added02354_vector_search_incident1654.sql(customer incident StorageKafka: extended configuration, parallel consumers, offset tracking #1654) asserting the guard fires.d56213874eb@ 2026-06-29 11:10 UTC; PR branch tip last updated 2026-06-26) added a feature that deliberately supports referencing_distancedirectly, with test queries and reference output in02354_vector_search_rescoring.sqlthat expect those queries to succeed. Its CI ran on a guard-free base and passed.The guard from #108423 pre-empts #105591's new code path at
useVectorSearch.cpp:236, so #105591's own added queries throwILLEGAL_COLUMNonce both changes are on master. Master bisect confirms:6519f67a9eb(pre-#105591, guard already present) was green for the existing tests;d56213874eb(#105591) is red across all configs.Reverting #105591 (the later, feature PR) restores master to green while preserving the customer-incident #1654 guard. The product decision of whether
_distanceshould be user-referenceable is left to the vector-search owners to reconcile and re-land #105591 on top of the #108423 guard.Local verification on the reverted branch (build id
16173c3a04...):02354_vector_search_rescoringoutput matches reference exactly02354_vector_search_rescoring_and_prewherepasses02354_vector_search_incident1654guard still firesILLEGAL_COLUMNfor both queriesNote: an earlier version of this description attributed the
_distanceguard to #107985; that was incorrect. #107985 ("Add ProfileEvents to observe distributed query plan execution") is unrelated. The guard was added by #108423, as pointed out by @Algunenano below.Version info
26.7.1.244