Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries by rschu1ze · Pull Request #108423 · ClickHouse/ClickHouse · GitHub
Skip to content

Throw ILLEGAL_COLUMN when _distance is selected directly in vector search queries#108423

Merged
alexey-milovidov merged 2 commits into
masterfrom
vec-distance
Jun 28, 2026
Merged

Throw ILLEGAL_COLUMN when _distance is selected directly in vector search queries#108423
alexey-milovidov merged 2 commits into
masterfrom
vec-distance

Conversation

@rschu1ze

@rschu1ze rschu1ze commented Jun 24, 2026

Copy link
Copy Markdown
Member

The _distance virtual column is internal to the vector search optimization and populated only when the optimized plan rewrites the query. Previously, referencing it directly in SELECT with an ORDER BY distance function caused a LOGICAL_ERROR ("Vector column unexpectedly already replaced") because the optimizer tried to add _distance to the read list while it was already there from the user's SELECT.

Now this case throws a user-facing ILLEGAL_COLUMN error with a clear message directing users to use the distance function (L2Distance, cosineDistance) in ORDER BY instead.

The check is placed in both the first pass (tryUseVectorSearch) and the second pass (optimizeVectorSearchSecondPass) for defense-in-depth coverage.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Vector search queries that SELECT from the _distance column now return a proper error instead of failing with a LOGICAL_ERROR.

Version info

  • Merged into: 26.7.1.186
  • Backported to: 26.6.2.9, 26.5.4.22, 26.4.5.78

…arch queries

The `_distance` virtual column is internal to the vector search optimization and
populated only when the optimized plan rewrites the query. Previously, referencing
it directly in SELECT with an ORDER BY distance function caused a LOGICAL_ERROR
("Vector column unexpectedly already replaced") because the optimizer tried to add
`_distance` to the read list while it was already there from the user's SELECT.

Now this case throws a user-facing ILLEGAL_COLUMN error with a clear message
directing users to use the distance function (L2Distance, cosineDistance) in
ORDER BY instead.

The check is placed in both the first pass (tryUseVectorSearch) and the second pass
(optimizeVectorSearchSecondPass) for defense-in-depth coverage.

Closes: ClickHouse/clickhouse-core-incidents#1654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 24, 2026
if (read_from_mergetree_step->isVectorColumnReplaced())
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"The `_distance` column is an internal virtual column of vector search and cannot be referenced directly in queries. "
"Use the distance function (e.g. `L2Distance`, `cosineDistance`) in ORDER BY instead");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested fix is misleading for the incident this PR handles: the failing query already has ORDER BY L2Distance(...); the direct reference is in the select list. As written, users can follow the instruction and still get the same error.

Please change the diagnostic in all three copies, or factor it, to tell users to select the distance expression instead of _distance, e.g. SELECT L2Distance(...) AS distance ... ORDER BY distance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay.

@ClickHouse ClickHouse deleted a comment from CLAassistant Jun 24, 2026
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ rschu1ze
❌ StackSlayerAI
You have signed the CLA already but the status is still pending? Let us recheck it.

@clickhouse-gh

clickhouse-gh Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.40% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 16/22 (72.73%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov self-assigned this Jun 28, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 28, 2026
Merged via the queue into master with commit 7625ed4 Jun 28, 2026
167 checks passed
@alexey-milovidov alexey-milovidov deleted the vec-distance branch June 28, 2026 02:47
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 28, 2026
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jun 28, 2026
robot-ch-test-poll4 added a commit that referenced this pull request Jun 28, 2026
Cherry pick #108423 to 26.4: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
robot-clickhouse added a commit that referenced this pull request Jun 28, 2026
…selected directly in vector search queries
robot-ch-test-poll4 added a commit that referenced this pull request Jun 28, 2026
Cherry pick #108423 to 26.5: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
robot-clickhouse added a commit that referenced this pull request Jun 28, 2026
…selected directly in vector search queries
robot-ch-test-poll4 added a commit that referenced this pull request Jun 28, 2026
Cherry pick #108423 to 26.6: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
robot-clickhouse added a commit that referenced this pull request Jun 28, 2026
…selected directly in vector search queries
@robot-clickhouse robot-clickhouse added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jun 28, 2026
clickhouse-gh Bot added a commit that referenced this pull request Jun 28, 2026
Backport #108423 to 26.6: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
alexey-milovidov added a commit that referenced this pull request Jun 29, 2026
Backport #108423 to 26.5: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
shankar-iyer added a commit that referenced this pull request Jun 29, 2026
Backport #108423 to 26.4: Throw `ILLEGAL_COLUMN` when `_distance` is selected directly in vector search queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v26.4-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants