Optimize AND comparison chains: detect conflicts, prune redundancies#99736
Optimize AND comparison chains: detect conflicts, prune redundancies#99736wudidapaopao wants to merge 82 commits into
Conversation
…chaining and reorder passes to detect transitive equality conflicts
…ains in LogicalExpressionOptimizerPass
|
Workflow [PR], commit [34bd051] Summary: ❌
AI ReviewSummaryThis PR meaningfully improves Findings❌ Blockers
Final VerdictStatus: |
e73fa86 to
37402ba
Compare
nihalzp
left a comment
There was a problem hiding this comment.
Thanks for working on it!
Let's place the optimization behind a setting which we can enable by default.
After the setting is added, let's duplicate the changed queries in the existing tests once with setting on and once with setting off. This will make sure we do not lose coverage.
|
Additionally, you may look at PR #98516 which contains some ideas that can be reused to improve cross-type comparison. |
…parisons in `LogicalExpressionOptimizerPass`
…on chain analysis
…story and disable it in 01625 constraint test
…ces to single operand
|
Hi @nihalzp, thanks for the suggestions! |
| SELECT 'eq_eq_same'; | ||
| SELECT * FROM 04032_t WHERE i = 3 AND i = 3 ORDER BY i SETTINGS optimize_and_compare_chain_pruning = 0; | ||
| SELECT * FROM 04032_t WHERE i = 3 AND i = 3 ORDER BY i SETTINGS optimize_and_compare_chain_pruning = 1; | ||
| EXPLAIN QUERY TREE SELECT * FROM 04032_t WHERE i = 3 AND i = 3 SETTINGS optimize_and_compare_chain_pruning = 0; |
There was a problem hiding this comment.
Instead of using EXPLAIN QUERY TREE, let's use EXPLAIN SYNTAX run_query_tree_passes = 1 which will give us a more concise output while still showing us that optimization is applied.
There was a problem hiding this comment.
Hi @nihalzp, I’ve switched to using EXPLAIN SYNTAX run_query_tree_passes = 1 per request. A single CI error is present but not related to this change.
The contradictory range c0 <= 1 AND c0 >= 3 must stay intact for EXPLAIN indexes to exercise primary key analysis; folding it to false breaks the reference.
Settings before -- { echoOn } are not echoed, so the reference stays unchanged.
…6.7 block After merging `master`, the current development version rolled from 26.6 to 26.7, so the new setting's `SettingsChangesHistory` entry must live in the 26.7 block instead of the now-released 26.6 block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| const auto & not_equals_arguments = not_equals_function->getArguments().getNodes(); | ||
| if (const auto * rhs_literal = not_equals_arguments[1]->as<ConstantNode>()) | ||
| node = std::make_shared<ConstantNode>(0u, function_node.getResultType()); |
There was a problem hiding this comment.
This conflict fold only checks for non-convertible comparison filters, but it can also drop non-comparison operands that the original and must evaluate before short-circuiting. For example, WHERE toUInt8(s) AND i = 1 AND i = 2 with s = 'x': and evaluates argument 0 before building the lazy mask, so the query raises CANNOT_PARSE_TEXT. This branch sees the i = 1 / i = 2 conflict and replaces the whole predicate with false, so the exception disappears when optimize_redundant_comparisons = 1.
Please only fold to false after proving no earlier/sibling operand can throw, or restrict this rewrite to cases where all operands being dropped are safe plain comparisons.
…nstants
`tryConvertToColumnType` converts a comparison constant to the column type with a
strict (lossless) conversion and lets the chain pruning fold/prune only when the
conversion succeeds. `strict` is supposed to reject any lossy conversion by returning
a null `Field`, but `convertFieldToType` does not honour that for `DateTime64`/`Time64`
scale reduction: it silently truncates a higher-scale value to a lower scale (e.g.
`1.23` of scale 2 becomes `1.2` of scale 1).
As a result the optimizer folded a comparison chain using a value different from the
original constant, changing query results versus the unoptimized query. For a
`DateTime64(1)` column,
`dt = toDateTime64('1970-01-01 00:00:01.20', 1) AND dt != toDateTime64('1970-01-01 00:00:01.23', 2)`
collapsed both constants to `1.2`, detected `equals` vs `notEquals` as a contradiction,
and folded the whole `AND` to `false` — dropping the row `1.20`, which must be kept
because `1.20 != 1.23`. The setting is enabled by default, so this affected results
silently.
Guard `tryConvertToColumnType` by requiring the strict conversion to be exactly
reversible, checked only for decimal-backed results (`DateTime64`/`Time64`/`Decimal`)
where this truncation can occur, leaving the already-correct string/integer/float
conversions untouched. Lossless finer-scale constants (e.g. `1.20` of scale 2 in a
`DateTime64(1)` column) are still optimized.
Addresses an unresolved review finding on
ClickHouse#99736
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed Fixed (wrong results): the Left for a maintainer design decision (not changed): the remaining open |
|
Continued this PR. The branch is 1.
|
|
@alexey-milovidov you are right. STID This PR is not involved. The crash reproduces with Minimal deterministic trigger (binary-searched from the crash dump's settings): The trigger is
The clean- This is the same assertion as #100422 but a distinct upstream rewrite. #104890 fixed the Session id: cron:clickhouse-worker-slot-7:20260630-101200 |
# Conflicts: # src/Analyzer/Passes/LogicalExpressionOptimizerPass.cpp # src/Core/Settings.cpp
|
Merged the latest Two conflicts, both from
Verified both conflict files compile ( |
…s DateTime constants
The comparison-chain pruning converts a comparison constant to the column type with
a strict (lossless) conversion and only folds/prunes when it succeeds. The reversibility
guard added for `DateTime64`/`Time64` scale reduction only checked decimal-backed results,
but `convertFieldToType` also silently truncates `DateTime`/`DateTime64` -> `Date`/`Date32`
(dropping the intra-day part) and `Float64` -> `Float32`, without honouring `strict`.
For a `Date` column, `d = toDate('2024-01-01') AND d != toDateTime('2024-01-01 12:34:56')`
is satisfiable by `d = '2024-01-01'`, because `Date`/`DateTime` comparisons are evaluated
in the `DateTime` domain (`d` promotes to `2024-01-01 00:00:00`, which differs from
`12:34:56`). But the `DateTime` constant truncated to `2024-01-01`, so the optimizer saw
`equals` vs `notEquals` on the same converted value and folded the whole `AND` to `false`,
dropping the row.
Apply the round-trip reversibility check in `tryConvertToColumnType` to all non-string
constants instead of only decimal-backed ones: a lossless conversion round-trips exactly,
so any lossy narrowing is now rejected and the optimization is conservatively skipped
(which only forgoes a fold and never changes results). String constants are excluded
because they are parsed directly at the column resolution (so never carry finer resolution)
and round-tripping them through the string rendering of a number/date would spuriously fail
(e.g. `Float64` `3.0` renders as `"3"`).
Add test `04498_redundant_comparisons_date_datetime_truncation`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed 0badf99 to fix the remaining wrong-results case flagged by the AI review.
Added |
|
Merged |
…_redundant_comparisons Address the AI review Major on `LogicalExpressionOptimizerPass.cpp:422`: the round-trip reversibility guard in `tryConvertToColumnType` is the correctness barrier for several lossy `strict` conversions, but the existing regressions only cover `DateTime64` scale reduction (`04489`) and `DateTime` -> `Date`/`Date32` (`04498`). This adds `04499_redundant_comparisons_lossy_conversions` with focused `optimize_redundant_comparisons = 0/1` equivalence tests for the remaining guarded surfaces: - `Time64` scale reduction, - `DateTime64` -> `Date`/`Date32` truncation, - `Float64` -> `Float32` narrowing. Each scenario asserts the same result with the optimization enabled and disabled: the lossy "keep the row" cases stay at their unoptimized result (the guard skips the fold), while the lossless cases still fold. Verified locally against a build of this branch. Review: ClickHouse#99736 (comment)
|
Addressed the AI review Major on It adds focused
Each case asserts the same result with the optimization enabled and disabled; verified locally against a Debug build of this branch (every enabled/disabled pair matches). The remaining AI blockers are the exception-suppression axis (folding a chain skips evaluating a deterministic expression that would otherwise throw), which is the previously-documented maintainer design decision, not wrong results. |
|
Merged the latest The only non-flaky red on the previous tip was The merge was clean (no conflicts); the two core source files ( |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 500/521 (95.97%) · Uncovered code |

Resolved #90081
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Optimize AND chains with multiple comparison conditions on the same expression: detect contradictions (e.g.,
a < 3 AND a > 5→false), prune redundant conditions (e.g.,a = 3 AND a < 5→a = 3).Documentation entry for user-facing changes
Summary
Extend
LogicalExpressionOptimizerPassto analyze and simplify AND chains of comparison conditions on the same expression (e.g.,a = 3 AND a < 5 AND a != 7). The optimizer now detects contradictions (rewrites tofalse), prunes redundant conditions, and handles cross-type constant comparisons viaconvertFieldToTypeStrict.Optimizations and fixes
i = 3 AND i < 5i = 3i = 3 AND i < 2falsei = 3 AND i != 5i = 3i = 3 AND i != 3falsei != 3 AND i < 3i < 3i < 5 AND i < 3i < 3i < 3 AND i >= 3falsei = 3 AND i = 3.0i = 3x = 3 AND x = y AND y = 5falsex < 3 AND y > 3 AND y < 10