DPhyp join reordering algorithm for inner joins#98798
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
652638f to
2c70ed9
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
…ual-cost EXPLAIN references Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| JoinKind join_kind, | ||
| std::vector<JoinActionRef *> & predicates) | ||
| { | ||
| auto selectivity = computeSelectivity(predicates, left->relations, right->relations); |
There was a problem hiding this comment.
evaluateJoin now shares the same per-predicate computeSelectivity cache across DPhyp hyperedge candidates, but that cache is keyed only by the JoinActionRef. For a complex equality such as (A + B) = (C + D), DPhyp can evaluate a different 4-way partition first; if dp_table does not yet contain the operand source sets {A,B} or {C,D}, getColumnStats returns 0 and computeSelectivity caches 1.0 for the predicate. When the valid {A,B} vs {C,D} split is evaluated later, it reuses that unselective cached value even though the composite DP entries now exist, so the hyperedge can be costed badly and lose the intended plan.
Please avoid caching selectivity when a composite operand's stats are missing, or include the operand source sets / stats availability in the cache key.
| SET use_statistics = 1; | ||
| SET query_plan_join_swap_table = 'auto'; | ||
| SET enable_join_runtime_filters = 0; | ||
| SET enable_parallel_replicas = 0; |
There was a problem hiding this comment.
This test relies on side-local predicates staying outside the join condition, but it never pins query_plan_merge_filter_into_join_condition. Under randomized settings that can be 1, and then the same a.val >= 50 shape takes the path covered by 03960_join_order_dphyp_merge_filter_fallback.sql: dphyp returns nullptr, so the dphyp-only queries here raise EXPERIMENTAL_FEATURE_ERROR instead of producing the expected counts.
Please set query_plan_merge_filter_into_join_condition = 0 in this file so it keeps testing filter-step behavior, while the dedicated fallback test covers the merged-ON behavior.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 355/383 (92.69%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 5 line(s) · Uncovered code |

DPhyp paper: Dynamic Programming Strikes Back (Moerkotte & Neumann, SIGMOD 2008)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added the experimental
dphypjoin reordering algorithm for inner joins as an option for thequery_plan_optimize_join_order_algorithmsetting, and thequery_plan_optimize_join_order_max_searched_planssetting, which bounds the join-order search and falls back to the next algorithm in the chain when the bound is exceeded; set it to0to keep the previous unbounded search behavior.Documentation entry for user-facing changes
Note
Medium Risk
Adds a new join reordering algorithm and refactors shared DP join-cost evaluation, which can change chosen join orders and performance characteristics on multi-join queries. Includes new fallback/limits behavior (e.g. disconnected graphs, unsupported predicates, >=64 relations) that could affect plan selection in edge cases.
Overview
Adds a new experimental
dphypoption toquery_plan_optimize_join_order_algorithm, implementing the DPhyp (hypergraph-based) dynamic-programming join reordering algorithm for inner joins, and wires it into the optimizer’s algorithm-fallback chain.Refactors join-order optimization internals by extracting shared plan evaluation (
evaluateJoin), clearingdp_tablebetween algorithms, and improvingBitSetutilities/perf (intersection/subset helpers, set-difference, and returning cached source-relation bitsets by reference).Introduces extensive stateless tests covering DPhyp correctness and fallback behavior across many join-graph shapes (chains, stars, cliques/cycles, caterpillars, hyperedges, transitive predicates, limits/over-limit, and algorithm-combination scenarios).
Reviewed by Cursor Bugbot for commit 94b6df1. Bugbot is set up for automated code reviews on this repo. Configure here.
Version info
26.6.1.1080