DPhyp join reordering algorithm for inner joins by davenger · Pull Request #98798 · ClickHouse/ClickHouse · GitHub
Skip to content

DPhyp join reordering algorithm for inner joins#98798

Merged
davenger merged 79 commits into
masterfrom
dphyp_join_reordering
Jun 20, 2026
Merged

DPhyp join reordering algorithm for inner joins#98798
davenger merged 79 commits into
masterfrom
dphyp_join_reordering

Conversation

@davenger

@davenger davenger commented Mar 4, 2026

Copy link
Copy Markdown
Member

DPhyp paper: Dynamic Programming Strikes Back (Moerkotte & Neumann, SIGMOD 2008)

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added the experimental dphyp join reordering algorithm for inner joins as an option for the query_plan_optimize_join_order_algorithm setting, and the query_plan_optimize_join_order_max_searched_plans setting, which bounds the join-order search and falls back to the next algorithm in the chain when the bound is exceeded; set it to 0 to keep the previous unbounded search behavior.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Adds a new join reordering algorithm and refactors shared DP join-cost evaluation, which can change chosen join orders and performance characteristics on multi-join queries. Includes new fallback/limits behavior (e.g. disconnected graphs, unsupported predicates, >=64 relations) that could affect plan selection in edge cases.

Overview
Adds a new experimental dphyp option to query_plan_optimize_join_order_algorithm, implementing the DPhyp (hypergraph-based) dynamic-programming join reordering algorithm for inner joins, and wires it into the optimizer’s algorithm-fallback chain.

Refactors join-order optimization internals by extracting shared plan evaluation (evaluateJoin), clearing dp_table between algorithms, and improving BitSet utilities/perf (intersection/subset helpers, set-difference, and returning cached source-relation bitsets by reference).

Introduces extensive stateless tests covering DPhyp correctness and fallback behavior across many join-graph shapes (chains, stars, cliques/cycles, caterpillars, hyperedges, transitive predicates, limits/over-limit, and algorithm-combination scenarios).

Reviewed by Cursor Bugbot for commit 94b6df1. Bugbot is set up for automated code reviews on this repo. Configure here.

Version info

  • Merged into: 26.6.1.1080

davenger and others added 3 commits March 4, 2026 23:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 4, 2026
davenger and others added 2 commits March 5, 2026 08:17
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp Outdated
davenger and others added 4 commits March 5, 2026 12:46
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.h Outdated
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp
davenger and others added 5 commits March 6, 2026 00:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp
davenger and others added 3 commits March 6, 2026 19:51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@davenger davenger force-pushed the dphyp_join_reordering branch from 652638f to 2c70ed9 Compare March 8, 2026 11:53
Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp
Comment thread tests/queries/0_stateless/03960_join_order_dphyp_cycle.sql
Comment thread tests/queries/0_stateless/03960_join_order_dphyp_triangle.sql
davenger and others added 2 commits June 12, 2026 18:36
…ual-cost EXPLAIN references

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
JoinKind join_kind,
std::vector<JoinActionRef *> & predicates)
{
auto selectivity = computeSelectivity(predicates, left->relations, right->relations);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluateJoin now shares the same per-predicate computeSelectivity cache across DPhyp hyperedge candidates, but that cache is keyed only by the JoinActionRef. For a complex equality such as (A + B) = (C + D), DPhyp can evaluate a different 4-way partition first; if dp_table does not yet contain the operand source sets {A,B} or {C,D}, getColumnStats returns 0 and computeSelectivity caches 1.0 for the predicate. When the valid {A,B} vs {C,D} split is evaluated later, it reuses that unselective cached value even though the composite DP entries now exist, so the hyperedge can be costed badly and lose the intended plan.

Please avoid caching selectivity when a composite operand's stats are missing, or include the operand source sets / stats availability in the cache key.

SET use_statistics = 1;
SET query_plan_join_swap_table = 'auto';
SET enable_join_runtime_filters = 0;
SET enable_parallel_replicas = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on side-local predicates staying outside the join condition, but it never pins query_plan_merge_filter_into_join_condition. Under randomized settings that can be 1, and then the same a.val >= 50 shape takes the path covered by 03960_join_order_dphyp_merge_filter_fallback.sql: dphyp returns nullptr, so the dphyp-only queries here raise EXPERIMENTAL_FEATURE_ERROR instead of producing the expected counts.

Please set query_plan_merge_filter_into_join_condition = 0 in this file so it keeps testing filter-step behavior, while the dedicated fallback test covers the merged-ON behavior.

Comment thread src/Core/Settings.cpp
@clickhouse-gh

clickhouse-gh Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.30% 85.20% -0.10%
Functions 92.60% 92.50% -0.10%
Branches 77.50% 77.40% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 355/383 (92.69%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 5 line(s) · Uncovered code

Full report · Diff report

@davenger davenger enabled auto-merge June 20, 2026 08:45
@davenger davenger added this pull request to the merge queue Jun 20, 2026
Merged via the queue into master with commit 2a3a502 Jun 20, 2026
487 of 490 checks passed
@davenger davenger deleted the dphyp_join_reordering branch June 20, 2026 16:50
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 20, 2026
@fm4v

fm4v commented Jun 22, 2026

Copy link
Copy Markdown
Member

@ayakovlev-clickhouse ayakovlev-clickhouse added the comp-joins JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla... label Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp-joins JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla... pr-experimental Experimental Feature pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants