Throw ILLEGAL_COLUMN when _distance is selected directly in vector search queries#108423
Conversation
…arch queries
The `_distance` virtual column is internal to the vector search optimization and
populated only when the optimized plan rewrites the query. Previously, referencing
it directly in SELECT with an ORDER BY distance function caused a LOGICAL_ERROR
("Vector column unexpectedly already replaced") because the optimizer tried to add
`_distance` to the read list while it was already there from the user's SELECT.
Now this case throws a user-facing ILLEGAL_COLUMN error with a clear message
directing users to use the distance function (L2Distance, cosineDistance) in
ORDER BY instead.
The check is placed in both the first pass (tryUseVectorSearch) and the second pass
(optimizeVectorSearchSecondPass) for defense-in-depth coverage.
Closes: ClickHouse/clickhouse-core-incidents#1654
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if (read_from_mergetree_step->isVectorColumnReplaced()) | ||
| throw Exception(ErrorCodes::ILLEGAL_COLUMN, | ||
| "The `_distance` column is an internal virtual column of vector search and cannot be referenced directly in queries. " | ||
| "Use the distance function (e.g. `L2Distance`, `cosineDistance`) in ORDER BY instead"); |
There was a problem hiding this comment.
The suggested fix is misleading for the incident this PR handles: the failing query already has ORDER BY L2Distance(...); the direct reference is in the select list. As written, users can follow the instruction and still get the same error.
Please change the diagnostic in all three copies, or factor it, to tell users to select the distance expression instead of _distance, e.g. SELECT L2Distance(...) AS distance ... ORDER BY distance.
|
|
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 16/22 (72.73%) | Lost baseline coverage: none · Uncovered code |
Cherry pick #108423 to 26.4: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
…selected directly in vector search queries
Cherry pick #108423 to 26.5: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
…selected directly in vector search queries
Cherry pick #108423 to 26.6: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
…selected directly in vector search queries
Backport #108423 to 26.6: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
Backport #108423 to 26.5: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
Backport #108423 to 26.4: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries

The
_distancevirtual column is internal to the vector search optimization and populated only when the optimized plan rewrites the query. Previously, referencing it directly in SELECT with an ORDER BY distance function caused a LOGICAL_ERROR ("Vector column unexpectedly already replaced") because the optimizer tried to add_distanceto the read list while it was already there from the user's SELECT.Now this case throws a user-facing ILLEGAL_COLUMN error with a clear message directing users to use the distance function (L2Distance, cosineDistance) in ORDER BY instead.
The check is placed in both the first pass (tryUseVectorSearch) and the second pass (optimizeVectorSearchSecondPass) for defense-in-depth coverage.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Vector search queries that SELECT from the
_distancecolumn now return a proper error instead of failing with aLOGICAL_ERROR.Version info
26.7.1.18626.6.2.9,26.5.4.22,26.4.5.78