{{ message }}
Do not use read-in-order optimization with grace hash join#102036
Merged
alexey-milovidov merged 2 commits intoApr 10, 2026
Conversation
Claude explanation: Root Cause The bug is in src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp, in the findReadingStep function (line 150). The optimize_read_in_order optimization propagates the "data is sorted" property through JOIN steps, allowing the final ORDER BY to skip a full sort and rely on the MergeTree's key ordering. However, this optimization checked only the join kind (Inner/Left) and strictness (Any/All) — it never verified whether the join algorithm preserves input row order. Grace hash join destroys input order because it scatters rows into buckets by hash value. Rows from bucket 0 come first, then bucket 1, etc. — this has nothing to do with the original sort key order. When the sorting step relies on a "prefix sort" (assuming data is already ordered), the output comes out in hash-bucket order rather than key order. The Fix One-line change: add && !join_ptr->hasDelayedBlocks() to the condition that allows read-in-order through joins. Joins with delayed blocks (grace hash join) reorder rows, so the optimization must not propagate through them. Regular HashJoin and ConcurrentHashJoin process rows in a single pass preserving order, so they remain unaffected.
Contributor
Member
|
Does it fix #100781 ? If so, mention it to close it. |
Member
|
The flaky check failure is fixed in #102148, let's update the branch. |
Contributor
LLVM Coverage ReportChanged lines: 100.00% (8/8) | lost baseline coverage: 1 line(s) · Uncovered code |
alexey-milovidov
approved these changes
Apr 10, 2026
alexey-milovidov
left a comment
Member
There was a problem hiding this comment.
The changes are very clear to me, thanks!
antaljanosbenjamin
added a commit
that referenced
this pull request
Apr 10, 2026
It was disabled by #102036 and it is possibly the cleaner approach
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixes incorrect row ordering in queries that use ORDER BY with the grace_hash join algorithm. Affected queries could return results in the wrong order, producing silently incorrect output.
Documentation entry for user-facing changes
Claude explanation:
Root Cause
The bug is in src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp, in the findReadingStep function (line 150). The optimize_read_in_order optimization propagates the "data is sorted" property through JOIN steps,
allowing the final ORDER BY to skip a full sort and rely on the MergeTree's key ordering. However, this optimization checked only the join kind (Inner/Left) and strictness (Any/All) — it never verified whether the join algorithm
preserves input row order.
Grace hash join destroys input order because it scatters rows into buckets by hash value. Rows from bucket 0 come first, then bucket 1, etc. — this has nothing to do with the original sort key order. When the sorting step relies
on a "prefix sort" (assuming data is already ordered), the output comes out in hash-bucket order rather than key order.
The Fix
One-line change: add && !join_ptr->hasDelayedBlocks() to the condition that allows read-in-order through joins. Joins with delayed blocks (grace hash join) reorder rows, so the optimization must not propagate through them.
Regular HashJoin and ConcurrentHashJoin process rows in a single pass preserving order, so they remain unaffected.