Do not apply read-in-order optimizations to STREAM reads#108568
Do not apply read-in-order optimizations to STREAM reads#108568alexey-milovidov merged 4 commits into
Conversation
A STREAM read returns parts in commit order, not sorting-key order, so its output is not globally sorted by the sorting key even though the key is non-empty. checkSupportedReadingStep still accepted such a ReadFromMergeTree, so the query plan optimizer requested read-in-order on it. The read then advertised the sorting-key order (via input_order_info) and fed unsorted data to order-dependent transforms. For DISTINCT in order this aborts the server in debug/sanitizer builds with 'Equal values are not contiguous within the range assumed to be sorted' (DistinctSortedStreamTransform). The same phantom order silently affects aggregation-in-order and LIMIT BY-in-order. Reject streaming reads in checkSupportedReadingStep so none of the in-order optimizations (read-in-order, aggregation-in-order, distinct-in-order, limit-by-in-order) are applied to a STREAM read. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @KochetovNicolai @novikd — could you review this? A STREAM read returns parts in commit order (not sorting-key order), but |
|
Workflow [PR], commit [9732127] Summary: ✅
AI ReviewSummaryThis PR disables in-order read optimizations for Final VerdictStatus: ✅ Approve |
STREAM reads are Linux-only (the server raises SUPPORT_IS_DISABLED for hasStream() under #ifndef OS_LINUX in QueryAnalyzer). The test was running on the macOS Fast test (arm_darwin) and failing with SUPPORT_IS_DISABLED. no-darwin skips it only on macOS while keeping full coverage on every Linux job, including the Linux Fast test where the STREAM crash repro is valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Pushed The previous head failed Session id: cron:clickhouse-worker-slot-1:20260626-162000 |
The stateless runner randomizes optimizer settings. The regression depends on the query-plan DISTINCT-in-order pass being enabled; if optimize_distinct_in_order or query_plan_enable_optimizations is randomized off, the query uses the generic distinct path and the unfixed build never reaches the bad input_order_info, so the test could pass without the fix. Pin enable_analyzer, query_plan_enable_optimizations and optimize_distinct_in_order so the test deterministically exercises the path the PR guards. Verified: with the pins the test crashes on the pre-fix binary and passes on the fixed binary; with the pins removed the pre-fix binary no longer crashes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the review on 04411_streaming_distinct_in_order.sql: pinned the three trigger settings so the test deterministically exercises the query-plan DISTINCT-in-order pass this PR guards, instead of only the default configuration. SET enable_analyzer = 1;
SET query_plan_enable_optimizations = 1;
SET optimize_distinct_in_order = 1;Re-verified both directions locally (debug build):
Pushed in 1e00156. |
CI finish ledger - 1e00156CI fully finished (Finish Workflow and Style check passed; no checks queued or in progress; >20 min since the last check completed). Every failure below has an owner.
Session id: cron:our-pr-ci-monitor:20260627-003000 |
Pick up merged PR ClickHouse#108513 (fix flaky test_keeper_disks::test_snapshots_with_disks) per @alexey-milovidov request on PR ClickHouse#108568.
|
@alexey-milovidov done. Merged current master into the branch (head 9732127), so it now includes the merged #108513. CI is re-running. |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 4/5 (80.00%) | Lost baseline coverage: none · Uncovered code |
CI finish ledger — 9732127CI fully finished on this head (Finish Workflow + Mergeable Check pass; no checks queued/in_progress; >20-min ingestion buffer). After merging master to pick up the now-merged #108513, every check is green and the previously-noted flaky Session id: cron:our-pr-ci-monitor:20260629-090000 |

Related: #108554
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a logical error (
Equal values are not contiguous within the range assumed to be sorted) when runningDISTINCTover aSTREAMread (SELECT DISTINCT ... FROM table STREAM). Read-in-order optimizations are no longer applied toSTREAMreads, which produce rows in commit order rather than sorting-key order.Description
A
STREAMread (FROM ... STREAM,enable_streaming_queries = 1) returns parts in commit order viaMergeTreeCommitOrderSequentialSource, not in sorting-key order, so its output is not globally sorted by the table sorting key.checkSupportedReadingStepinoptimizeReadInOrder.cppstill accepted such aReadFromMergeTree(the sorting key is non-empty), so the query plan optimizer requested read-in-order on it. The read then advertised the sorting-key order (input_order_info->getSortDescription()), and order-dependent transforms were applied on top of unsorted data.For DISTINCT in order this feeds unsorted data to
DistinctSortedStreamTransform, which aborts the server in debug/sanitizer builds:The same phantom order silently affects aggregation-in-order and LIMIT BY-in-order on
STREAMreads.Fix: reject streaming reads in
checkSupportedReadingStep, so none of the in-order optimizations (read-in-order, aggregation-in-order, distinct-in-order, limit-by-in-order) are applied to aSTREAMread.STREAMis only valid onStorageMergeTree/StorageReplicatedMergeTree(both produceReadFromMergeTree), so the single guard covers all streaming reads; theReadFromMerge/ReadFromObjectStorageStepbranches cannot be streaming.Reproducer (crashes before the fix, returns correctly after):
Found by the AST fuzzer on the
Stress test (arm_asan_ubsan, s3)check (STID 2508-4156, querySELECT count() FROM (SELECT DISTINCT * FROM t_streaming_test STREAM LIMIT 10)):https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108554&sha=bdbb85719cfc04f07cc5084b3e4a932df073b419&name_0=PR&name_1=Stress%20test%20%28arm_asan_ubsan%2C%20s3%29
Version info
26.7.1.273