iframe-proxy

davenger · 2026-03-04T23:52:50Z

DPhyp paper: Dynamic Programming Strikes Back (Moerkotte & Neumann, SIGMOD 2008)

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added the experimental dphyp join reordering algorithm for inner joins as an option for the query_plan_optimize_join_order_algorithm setting, and the query_plan_optimize_join_order_max_searched_plans setting, which bounds the join-order search and falls back to the next algorithm in the chain when the bound is exceeded; set it to 0 to keep the previous unbounded search behavior.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Note

Medium Risk
Adds a new join reordering algorithm and refactors shared DP join-cost evaluation, which can change chosen join orders and performance characteristics on multi-join queries. Includes new fallback/limits behavior (e.g. disconnected graphs, unsupported predicates, >=64 relations) that could affect plan selection in edge cases.

Overview
Adds a new experimental dphyp option to query_plan_optimize_join_order_algorithm, implementing the DPhyp (hypergraph-based) dynamic-programming join reordering algorithm for inner joins, and wires it into the optimizer’s algorithm-fallback chain.

Refactors join-order optimization internals by extracting shared plan evaluation (evaluateJoin), clearing dp_table between algorithms, and improving BitSet utilities/perf (intersection/subset helpers, set-difference, and returning cached source-relation bitsets by reference).

Introduces extensive stateless tests covering DPhyp correctness and fallback behavior across many join-graph shapes (chains, stars, cliques/cycles, caterpillars, hyperedges, transitive predicates, limits/over-limit, and algorithm-combination scenarios).

^{Reviewed by Cursor Bugbot for commit 94b6df1. Bugbot is set up for automated code reviews on this repo. Configure here.}

Version info

Merged into: 26.6.1.1080

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

clickhouse-gh · 2026-03-04T23:53:30Z

Workflow [PR], commit [f05a714]

Summary: ✅

Performance Comparison: Performance dashboard

AI Review

Summary

This PR adds experimental dphyp join-order enumeration plus a deterministic searched-plan budget and fallback chaining. The implementation is much closer after the latest fixes, but I still see unresolved planner completeness/costing issues and two focused test gaps around the new budget/fallback behavior.

Findings

⚠️ Majors

[src/Processors/QueryPlan/Optimizations/joinOrder.cpp:1115] Overlapping operand-source hyperedges can still hide a valid no-cross-product plan. For a predicate such as c.x + b.y = b.z + a.w, buildHyperedges records overlapping sides {b,c} and {a,b}. After {b,c} is built, emitCsg({b,c}) excludes lower-index a through B_min, while getNeighborhood({a}) cannot expose {b,c} because neither operand side is a subset of {a}. dphyp can therefore fail/fall back even though the predicate is applicable to {b,c} JOIN {a}. [dismissed by author -- https://github.com/DPhyp join reordering algorithm for inner joins #98798#discussion_r3323752395] I still consider this real because the current getNeighborhood subset checks and B_min exclusion still follow that trace.
[src/Processors/QueryPlan/Optimizations/joinOrder.cpp:945] The transitive-selectivity cost can still diverge from the executed join predicate set when the join already has another predicate. computeSelectivity prices the pair with a transitive equality, but cleanupJoinPredicates only synthesizes missing transitive predicates when expressions.empty(), so a join with A.v < C.v plus inferred A.k = C.k can be costed as filtered by A.k = C.k while executing without it. [dismissed by author -- https://github.com/DPhyp join reordering algorithm for inner joins #98798#discussion_r3363967825] I agree this is plan quality rather than wrong results, but the cost/execution mismatch is still present in the current code.
[src/Processors/QueryPlan/Optimizations/joinOrder.cpp:945] The per-predicate selectivity cache is still keyed only by JoinActionRef. If a complex equality is first evaluated for a partition where a composite operand's dp_table stats are missing, computeSelectivity caches 1.0; a later valid partition reuses that unselective value even after the needed composite stats exist. Avoid caching when composite operand stats are unavailable, or include operand source sets / stats availability in the key.

Tests

⚠️ [tests/queries/0_stateless/03960_join_order_dphyp_search_budget.sql:50] The dpsize,greedy case still returns 5 whether dpsize honors the budget and falls back or ignores the budget and succeeds itself. Add a dpsize-only low-budget case expecting EXPERIMENTAL_FEATURE_ERROR, then keep the fallback case.
⚠️ [tests/queries/0_stateless/03960_join_order_dphyp_single_table_predicates.sql:9] This file still does not pin query_plan_merge_filter_into_join_condition = 0. With randomized settings it can become 1, causing the dphyp-only cases to take the merged-ON unsupported path instead of testing filter-step predicates.

Final Verdict

Status: ⚠️ Request changes

Minimum actions: fix or explicitly gate the remaining dphyp hyperedge completeness issue, address the selectivity-cache mismatch for complex operands, and strengthen the two focused tests above. The transitive-selectivity mismatch should either be fixed here or intentionally split out with the risk accepted before merge.

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

…ual-cost EXPLAIN references Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

clickhouse-gh · 2026-06-12T21:26:30Z

+    JoinKind join_kind,
+    std::vector<JoinActionRef *> & predicates)
+{
+    auto selectivity = computeSelectivity(predicates, left->relations, right->relations);


evaluateJoin now shares the same per-predicate computeSelectivity cache across DPhyp hyperedge candidates, but that cache is keyed only by the JoinActionRef. For a complex equality such as (A + B) = (C + D), DPhyp can evaluate a different 4-way partition first; if dp_table does not yet contain the operand source sets {A,B} or {C,D}, getColumnStats returns 0 and computeSelectivity caches 1.0 for the predicate. When the valid {A,B} vs {C,D} split is evaluated later, it reuses that unselective cached value even though the composite DP entries now exist, so the hyperedge can be costed badly and lose the intended plan.

Please avoid caching selectivity when a composite operand's stats are missing, or include the operand source sets / stats availability in the cache key.

…join_reordering

clickhouse-gh · 2026-06-13T06:11:57Z

+SET use_statistics = 1;
+SET query_plan_join_swap_table = 'auto';
+SET enable_join_runtime_filters = 0;
+SET enable_parallel_replicas = 0;


This test relies on side-local predicates staying outside the join condition, but it never pins query_plan_merge_filter_into_join_condition. Under randomized settings that can be 1, and then the same a.val >= 50 shape takes the path covered by 03960_join_order_dphyp_merge_filter_fallback.sql: dphyp returns nullptr, so the dphyp-only queries here raise EXPERIMENTAL_FEATURE_ERROR instead of producing the expected counts.

Please set query_plan_merge_filter_into_join_condition = 0 in this file so it keeps testing filter-step behavior, while the dedicated fallback test covers the merged-ON behavior.

…join_reordering

clickhouse-gh · 2026-06-19T20:12:04Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.30%	85.20%	-0.10%
Functions	92.60%	92.50%	-0.10%
Branches	77.50%	77.40%	-0.10%

Changed lines: Changed C/C++ lines covered by tests: 355/383 (92.69%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 5 line(s) · Uncovered code

Full report · Diff report

fm4v · 2026-06-22T10:43:38Z

davenger and others added 3 commits March 4, 2026 23:22

Initial implementation of DPhyp for inner joins

99d462c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests for DPhyp

70ed9c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

More tests for DPhyp

cf5604b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 4, 2026

davenger and others added 2 commits March 5, 2026 08:17

Add tags to a slow test case

63c8c23

Add dphyp to query_plan_optimize_join_order_algorithm setting descrip…

23c0d7f

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp Outdated

davenger and others added 4 commits March 5, 2026 12:46

Repro for complement neighborhood enumeration bug

ed47aa7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change test to have only one plan with best cost

b68bb4e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

One more test for DPhyp

f7e997b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix for neighbour enumeration

be24981

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp

Fix complement enumeration

959e809

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.h Outdated

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp

davenger and others added 5 commits March 6, 2026 00:45

Clear DP table at start

ce3ace3

Removed unused field

f5aa0f2

tidy

4b974f8

Fix enumerateCmpRec

eb0ef4d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix getNeighborhood

b2542e5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp

davenger and others added 3 commits March 6, 2026 19:51

Removed unused isConnectedInGraph

0d5106f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fewer allocation

b8fe792

Logging and cleanups

2c70ed9

davenger force-pushed the dphyp_join_reordering branch from 652638f to 2c70ed9 Compare March 8, 2026 11:53

cursor Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp Outdated

Moved Hyperedge struct to .cpp

54ffece

cursor Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/joinOrder.cpp

davenger added 2 commits March 9, 2026 18:26

Remove redundant tests

2b7001c

Fix hyperedge construction

c84e276

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/03960_join_order_dphyp_cycle.sql

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/03960_join_order_dphyp_triangle.sql

davenger and others added 2 commits June 12, 2026 18:36

Pin query_plan_optimize_join_order_randomize=0 in DPhyp tests with eq…

877d13b

…ual-cost EXPLAIN references Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Merge branch 'master' into dphyp_join_reordering

fa4ec25

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

groeneai mentioned this pull request Jun 12, 2026

Try absl::flat_hash_map in Keeper storage #85138

Open

1 task

Merge branch 'master' of github.com:ClickHouse/ClickHouse into dphyp_…

758ff7d

…join_reordering

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Merge branch 'master' of github.com:ClickHouse/ClickHouse into dphyp_…

2248957

…join_reordering

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Core/Settings.cpp

Merge branch 'master' into dphyp_join_reordering

7486e69

alexey-milovidov mentioned this pull request Jun 13, 2026

Skip macOS-incompatible distributed tests 04327 and 04336 #107376

Merged

fkastrati self-requested a review June 15, 2026 10:14

novikd approved these changes Jun 15, 2026

View reviewed changes

clickhouse-gh Bot assigned novikd Jun 15, 2026

davenger added 2 commits June 16, 2026 08:10

Merge branch 'master' into dphyp_join_reordering

26fac07

Merge branch 'master' of github.com:ClickHouse/ClickHouse into dphyp_…

f05a714

…join_reordering

davenger enabled auto-merge June 20, 2026 08:45

davenger added this pull request to the merge queue Jun 20, 2026

Merged via the queue into master with commit 2a3a502 Jun 20, 2026
487 of 490 checks passed

davenger deleted the dphyp_join_reordering branch June 20, 2026 16:50

robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 20, 2026

alexey-milovidov mentioned this pull request Jun 20, 2026

Fix cardinality underestimation when join column NDVs are unknown #101398

Open

1 task

groeneai mentioned this pull request Jun 21, 2026

Fix Bad cast for a qualified asterisk over a JOIN USING key nested under a PASTE/CROSS join #108043

Merged

alexey-milovidov mentioned this pull request Jun 22, 2026

Implemented SIEVE eviction in cache framework #62756

Open

1 task

fkastrati mentioned this pull request Jun 23, 2026

Add the DPsub join-order enumeration algorithm #107351

Merged

ayakovlev-clickhouse added the comp-joins JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla... label Jun 29, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

davenger commented Mar 4, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

clickhouse-gh Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Tests

Final Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clickhouse-gh Bot commented Jun 19, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

fm4v commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davenger commented Mar 4, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Mar 4, 2026 •

edited

Loading