Do not use read-in-order optimization with grace hash join by antaljanosbenjamin · Pull Request #102036 · ClickHouse/ClickHouse · GitHub
Skip to content

Do not use read-in-order optimization with grace hash join#102036

Merged
alexey-milovidov merged 2 commits into
masterfrom
do-not-use-read-in-order-with-grace-hash-join
Apr 10, 2026
Merged

Do not use read-in-order optimization with grace hash join#102036
alexey-milovidov merged 2 commits into
masterfrom
do-not-use-read-in-order-with-grace-hash-join

Conversation

@antaljanosbenjamin

@antaljanosbenjamin antaljanosbenjamin commented Apr 8, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes incorrect row ordering in queries that use ORDER BY with the grace_hash join algorithm. Affected queries could return results in the wrong order, producing silently incorrect output.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Claude explanation:
Root Cause

The bug is in src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp, in the findReadingStep function (line 150). The optimize_read_in_order optimization propagates the "data is sorted" property through JOIN steps,
allowing the final ORDER BY to skip a full sort and rely on the MergeTree's key ordering. However, this optimization checked only the join kind (Inner/Left) and strictness (Any/All) — it never verified whether the join algorithm
preserves input row order.

Grace hash join destroys input order because it scatters rows into buckets by hash value. Rows from bucket 0 come first, then bucket 1, etc. — this has nothing to do with the original sort key order. When the sorting step relies
on a "prefix sort" (assuming data is already ordered), the output comes out in hash-bucket order rather than key order.

The Fix

One-line change: add && !join_ptr->hasDelayedBlocks() to the condition that allows read-in-order through joins. Joins with delayed blocks (grace hash join) reorder rows, so the optimization must not propagate through them.
Regular HashJoin and ConcurrentHashJoin process rows in a single pass preserving order, so they remain unaffected.

Claude explanation:
Root Cause

  The bug is in src/Processors/QueryPlan/Optimizations/optimizeReadInOrder.cpp, in the findReadingStep function (line 150). The optimize_read_in_order optimization propagates the "data is sorted" property through JOIN steps,
  allowing the final ORDER BY to skip a full sort and rely on the MergeTree's key ordering. However, this optimization checked only the join kind (Inner/Left) and strictness (Any/All) — it never verified whether the join algorithm
  preserves input row order.

  Grace hash join destroys input order because it scatters rows into buckets by hash value. Rows from bucket 0 come first, then bucket 1, etc. — this has nothing to do with the original sort key order. When the sorting step relies
  on a "prefix sort" (assuming data is already ordered), the output comes out in hash-bucket order rather than key order.

  The Fix

  One-line change: add && !join_ptr->hasDelayedBlocks() to the condition that allows read-in-order through joins. Joins with delayed blocks (grace hash join) reorder rows, so the optimization must not propagate through them.
  Regular HashJoin and ConcurrentHashJoin process rows in a single pass preserving order, so they remain unaffected.
@clickhouse-gh

clickhouse-gh Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 8, 2026
@PedroTadim

PedroTadim commented Apr 9, 2026

Copy link
Copy Markdown
Member

Does it fix #100781 ? If so, mention it to close it.

@antaljanosbenjamin antaljanosbenjamin linked an issue Apr 9, 2026 that may be closed by this pull request
@alexey-milovidov

Copy link
Copy Markdown
Member

The flaky check failure is fixed in #102148, let's update the branch.

@clickhouse-gh

clickhouse-gh Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (8/8) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are very clear to me, thanks!

@alexey-milovidov alexey-milovidov self-assigned this Apr 10, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Apr 10, 2026
Merged via the queue into master with commit 3412f87 Apr 10, 2026
164 checks passed
@alexey-milovidov alexey-milovidov deleted the do-not-use-read-in-order-with-grace-hash-join branch April 10, 2026 08:46
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
antaljanosbenjamin added a commit that referenced this pull request Apr 10, 2026
It was disabled by #102036 and it is possibly the cleaner approach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grace hash wrong result with number of buckets

4 participants