Fix TYPE_MISMATCH in lazy-materialization top-K threshold for ORDER BY ... LIMIT#107060
Conversation
|
cc @KochetovNicolai @vdimir — could you review this? It fixes the #106294 regression where |
|
Workflow [PR], commit [7172412] Summary: ❌
AI ReviewSummaryThis PR fixes the Final VerdictStatus: ✅ Approve |
|
CI fully finished (Finish Workflow passed). The single failure is Stateless tests (arm_asan_ubsan, azure, sequential) -- known azure-infra-outage / db-disk "Server died" chronic class, unrelated to this PR (lazy-materialization topKFilter TYPE_MISMATCH fix). Not PR-caused. No Fast/Bugfix/Style failure. |
c096332 to
fdceaf7
Compare
fdceaf7 to
4676779
Compare
|
Done in 4676779. Thanks, this was a real false-negative: the original test passed even against the pre-fix binary. Two parts to the fix:
Validated both directions through the actual
(For comparison, the original subquery-pinned test gave 5/5 false OK on the buggy binary.) Session id: cron:clickhouse-worker-slot-1:20260612-154400 |
4676779 to
3d08b53
Compare
|
Follow-up on the The presence assertion the test added ( Fix: tagged the test Validated with the server default set to
No source change; the Session: cron:clickhouse-worker-slot-4:20260612-210100 |
MergeSortingTransform::remerge() published the top-K dynamic-filter threshold by reading physical column position 0 of the sorted chunk (chunks[0].getColumns()[0]). That assumes the first sort column is the first column in the chunk, which is not generally true. With query_plan_optimize_lazy_materialization the Sorting step's input header can place a WHERE-only column before the sort column. For `SELECT path, file_name, file_size FROM t WHERE path != '' ORDER BY file_size DESC LIMIT n` the lazy-materialized plan reads `path` (String) at position 0 and the sort column `file_size` (UInt64) at position 1. The threshold tracker then received a String path value, and the injected `__topKFilter(file_size)` prewhere called convertFieldToType(stringPath, UInt64), throwing "Cannot convert string ... to type UInt64" (TYPE_MISMATCH). The failure only surfaced via the MergeSortingTransform single-stream remerge path; the sibling PartialSortingTransform already resolves the sort column position by name (description_with_positions), so it was unaffected. Resolve the first sort column's actual position by name from header_without_constants (which matches the chunk column order) instead of hardcoding 0. Closes ClickHouse#106294 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3d08b53 to
a2e7e7a
Compare
|
CI finished on a2e7e7a. Finish Workflow passed; 147 jobs, ~229.9k tests, 1 red.
No PR-caused failures. Ready for review. |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 5/5 (100.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 2 line(s) · Uncovered code |
|
CI finished on HEAD
The lazy-materialization |
…er_column_position.sql
ed82316

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix
TYPE_MISMATCHerror ("Cannot convert string ... to type ...") forORDER BY <numeric column> ... LIMIT nqueries when lazy materialization placed another column before the sort column. The top-K threshold is now read from the correct sort column.Description
Closes #106294. Regression introduced in 26.5.
MergeSortingTransform::remerge()published the top-K dynamic-filter threshold by reading physical column position 0 of the sorted chunk (chunks[0].getColumns()[0]), assuming the first sort column is the first column in the chunk. Withquery_plan_optimize_lazy_materializationenabled, the Sorting step's input header can place aWHERE-only column before the sort column. For the reporter's shapeSELECT path, file_name, file_size FROM t WHERE path != '' ORDER BY file_size DESC LIMIT n, the lazy-materialized plan readspath(String) at position 0 and the sort columnfile_size(UInt64) at position 1, so the threshold tracker received a String value. The injected__topKFilter(file_size)prewhere then calledconvertFieldToType(stringValue, UInt64), throwingTYPE_MISMATCH.The fix resolves the first sort column's actual position by name from
header_without_constants(which matches the chunk column order), the same way the siblingPartialSortingTransformalready does. The failure only surfaced through theMergeSortingTransformsingle-stream remerge path;PartialSortingTransformwas unaffected.A stateless regression test (
04327_lazy_materialization_topk_filter_column_position) forces the remerge path (max_threads = 1) and asserts the optimized result matches the unoptimized one.Version info
26.6.1.852