Fix TYPE_MISMATCH in lazy-materialization top-K threshold for ORDER BY ... LIMIT by groeneai · Pull Request #107060 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix TYPE_MISMATCH in lazy-materialization top-K threshold for ORDER BY ... LIMIT#107060

Merged
nickitat merged 4 commits into
ClickHouse:masterfrom
groeneai:fix/106294-lazy-materialization-topk-column-position
Jun 16, 2026
Merged

Fix TYPE_MISMATCH in lazy-materialization top-K threshold for ORDER BY ... LIMIT#107060
nickitat merged 4 commits into
ClickHouse:masterfrom
groeneai:fix/106294-lazy-materialization-topk-column-position

Conversation

@groeneai

@groeneai groeneai commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix TYPE_MISMATCH error ("Cannot convert string ... to type ...") for ORDER BY <numeric column> ... LIMIT n queries when lazy materialization placed another column before the sort column. The top-K threshold is now read from the correct sort column.

Description

Closes #106294. Regression introduced in 26.5.

MergeSortingTransform::remerge() published the top-K dynamic-filter threshold by reading physical column position 0 of the sorted chunk (chunks[0].getColumns()[0]), assuming the first sort column is the first column in the chunk. With query_plan_optimize_lazy_materialization enabled, the Sorting step's input header can place a WHERE-only column before the sort column. For the reporter's shape SELECT path, file_name, file_size FROM t WHERE path != '' ORDER BY file_size DESC LIMIT n, the lazy-materialized plan reads path (String) at position 0 and the sort column file_size (UInt64) at position 1, so the threshold tracker received a String value. The injected __topKFilter(file_size) prewhere then called convertFieldToType(stringValue, UInt64), throwing TYPE_MISMATCH.

The fix resolves the first sort column's actual position by name from header_without_constants (which matches the chunk column order), the same way the sibling PartialSortingTransform already does. The failure only surfaced through the MergeSortingTransform single-stream remerge path; PartialSortingTransform was unaffected.

A stateless regression test (04327_lazy_materialization_topk_filter_column_position) forces the remerge path (max_threads = 1) and asserts the optimized result matches the unoptimized one.

Version info

  • Merged into: 26.6.1.852

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @KochetovNicolai @vdimir — could you review this? It fixes the #106294 regression where MergeSortingTransform published the top-K dynamic-filter threshold from hardcoded chunk position 0; with lazy materialization a WHERE-only column can precede the sort column, so the threshold picked up the wrong column and __topKFilter threw TYPE_MISMATCH. The fix resolves the sort column position by name, matching what PartialSortingTransform already does.

@clickhouse-gh

clickhouse-gh Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [7172412]

Summary:

job_name test_name status info comment
Unit tests (amd_llvm_coverage) FAIL
-FunctionsStress FAIL cidb, issue ISSUE EXISTS
LLVM Coverage DROPPED IGNORED

AI Review

Summary

This PR fixes the TYPE_MISMATCH exception in lazy-materialized ORDER BY ... LIMIT top-K plans by publishing the dynamic-filter threshold from the actual first sort-column position in MergeSortingTransform, instead of assuming physical column position 0. The regression test now pins the relevant optimizer settings, asserts that __topKFilter(file_size) and LazilyReadFromMergeTree are present, and compares the optimized result against the query_plan_optimize_lazy_materialization = 0 path. I found no remaining correctness, safety, or test-coverage blockers in the current diff.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 10, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

CI fully finished (Finish Workflow passed). The single failure is Stateless tests (arm_asan_ubsan, azure, sequential) -- known azure-infra-outage / db-disk "Server died" chronic class, unrelated to this PR (lazy-materialization topKFilter TYPE_MISMATCH fix). Not PR-caused. No Fast/Bugfix/Style failure.

@nickitat nickitat self-assigned this Jun 12, 2026
@groeneai groeneai force-pushed the fix/106294-lazy-materialization-topk-column-position branch from c096332 to fdceaf7 Compare June 12, 2026 14:50
@groeneai groeneai force-pushed the fix/106294-lazy-materialization-topk-column-position branch from fdceaf7 to 4676779 Compare June 12, 2026 16:45
@groeneai

Copy link
Copy Markdown
Contributor Author

Done in 4676779. Thanks, this was a real false-negative: the original test passed even against the pre-fix binary.

Two parts to the fix:

  1. Added the EXPLAIN actions = 1 assertion you suggested. It checks that both __topKFilter(file_size) and LazilyReadFromMergeTree are present, so the threshold-publishing path is proven to be exercised.

  2. Pinned use_top_k_dynamic_filtering, query_plan_max_limit_for_top_k_optimization, query_plan_max_limit_for_lazy_materialization (and max_threads=1, max_block_size, query_plan_optimize_lazy_materialization). Important detail I found while validating: pinning them inside the scalar subqueries (as the original test did) is NOT enough. clickhouse-test injects the randomized values as session/CLI settings, and a subquery-level SETTINGS clause does not override a session-level value. So with --use_top_k_dynamic_filtering 0 injected, the optimization was disabled and the test passed against the buggy code. The pins are now SET statements at the top of the test (session level), which do take precedence over the injected CLI values. query_plan_optimize_lazy_materialization is still toggled per-subquery (1 vs 0) since that is the only intended difference between the two halves of the equality check.

Validated both directions through the actual clickhouse-test runner with randomization on:

  • pre-fix binary (hardcoded position 0): 10/10 FAIL (TYPE_MISMATCH), confirming the test now catches the regression.
  • fixed binary: 30/30 PASS.

(For comparison, the original subquery-pinned test gave 5/5 false OK on the buggy binary.)

Session id: cron:clickhouse-worker-slot-1:20260612-154400

@groeneai groeneai force-pushed the fix/106294-lazy-materialization-topk-column-position branch from 4676779 to 3d08b53 Compare June 12, 2026 21:15
@groeneai

Copy link
Copy Markdown
Contributor Author

Follow-up on the EXPLAIN actions=1 assertion (HEAD 3d08b5350ee).

The presence assertion the test added (__topKFilter(file_size) + LazilyReadFromMergeTree) correctly proves the optimized path is exercised, but it surfaced on the old analyzer matrix: lazy materialization is gated on the new analyzer (QueryPlanOptimizationSettings.cpp: optimize_lazy_materialization = query_plan_optimize_lazy_materialization && allow_experimental_analyzer). Under enable_analyzer=0 the __topKFilter/LazilyReadFromMergeTree plan never forms, so the assertion returned 0 and 04327 failed only on Stateless tests (amd_llvm_coverage, old analyzer, ...) while passing on every new-analyzer variant + 50/50 flaky checks.

Fix: tagged the test no-old-analyzer (the test validates a new-analyzer-only optimization, so it is skipped under the old analyzer, like other new-analyzer plan-assertion tests).

Validated with the server default set to enable_analyzer=0 (the old-analyzer job condition, via users.d/analyzer.xml):

  • old analyzer: [ SKIPPED ] Reason: old-analyzer (was [ FAIL ] before the tag)
  • new analyzer: [ OK ], 20/20 with randomization on

No source change; the MergeSortingTransform column-position fix is unchanged.

Session: cron:clickhouse-worker-slot-4:20260612-210100

MergeSortingTransform::remerge() published the top-K dynamic-filter
threshold by reading physical column position 0 of the sorted chunk
(chunks[0].getColumns()[0]). That assumes the first sort column is the
first column in the chunk, which is not generally true.

With query_plan_optimize_lazy_materialization the Sorting step's input
header can place a WHERE-only column before the sort column. For
`SELECT path, file_name, file_size FROM t WHERE path != '' ORDER BY
file_size DESC LIMIT n` the lazy-materialized plan reads `path` (String)
at position 0 and the sort column `file_size` (UInt64) at position 1.
The threshold tracker then received a String path value, and the
injected `__topKFilter(file_size)` prewhere called
convertFieldToType(stringPath, UInt64), throwing
"Cannot convert string ... to type UInt64" (TYPE_MISMATCH).

The failure only surfaced via the MergeSortingTransform single-stream
remerge path; the sibling PartialSortingTransform already resolves the
sort column position by name (description_with_positions), so it was
unaffected.

Resolve the first sort column's actual position by name from
header_without_constants (which matches the chunk column order) instead
of hardcoding 0.

Closes ClickHouse#106294

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai force-pushed the fix/106294-lazy-materialization-topk-column-position branch from 3d08b53 to a2e7e7a Compare June 12, 2026 23:54
@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on a2e7e7a. Finish Workflow passed; 147 jobs, ~229.9k tests, 1 red.

  • Fast test (arm_darwin) / 04336_parallel_blocks_marshalling_low_cardinality_native_format: not PR-caused. Known macOS-only flake (the test uses remote('127.0.0.{1,2}'); 127.0.0.2 is unbindable on Darwin), failing across 20+ unrelated PRs. Tracked separately with fix PR Skip macOS-incompatible distributed tests 04327 and 04336 #107376 (adds it to ci/defs/darwin.skip). This PR touches only MergeSortingTransform.cpp + the 04327 test, neither related to 04336.

No PR-caused failures. Ready for review.

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 5/5 (100.00%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 2 line(s) · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on HEAD 446bd56f063a (the auto master-merge). Both reds are pre-existing flaky failures, not caused by this PR (which only touches MergeSortingTransform.cpp + the 04327 test):

  • Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) / 02770_async_buffer_ignore — flaky, 7 failures across 7 unrelated PRs / 0 on master in 14d. Already has a fix in flight (Fix flaky test 02770_async_buffer_ignore #107298).
  • Integration tests (arm_binary, distributed plan, 1/4) / test_refreshable_mv_keeper_loss::test_keeper_session_loss_during_coordinated_refresh — flaky keeper-session-loss fault-injection test, 6 failures across 5 unrelated PRs / 0 on master in 14d. Unrelated to sorting.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107060&sha=446bd56f063a3d7850563ef6939a4907ce62ec7e&name_0=PR

The lazy-materialization MergeSortingTransform column-position fix and its 04327 regression test are unchanged and validated both directions.

Comment thread src/Processors/Transforms/MergeSortingTransform.cpp
Comment thread src/Processors/Transforms/MergeSortingTransform.cpp
@groeneai

Copy link
Copy Markdown
Contributor Author

@nickitat nickitat added this pull request to the merge queue Jun 16, 2026
Merged via the queue into ClickHouse:master with commit ed82316 Jun 16, 2026
805 of 815 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in 26.5: lazy-materialization __topKFilter throws TYPE_MISMATCH on ORDER BY ... LIMIT (works in 26.1)

5 participants