Optimize filtered constant columns#105541
Conversation
|
Workflow [PR], commit [13cc95c] Summary: ❌
AI ReviewSummaryThis PR teaches Findings❌ Blockers
Final Verdict
|
f18c917 to
ae0a620
Compare
Avoid replacing floating-point columns after `equals`, because `+0.0` and `-0.0` compare equal but have different bit representations. Materialize `Nullable` constant columns before passing them downstream to avoid invalid keys in grouping sets with `group_by_use_nulls`. PR: ClickHouse#105541 Tests: ninja -C build programs/clickhouse Tests: tests/clickhouse-test --no-random-settings --no-random-merge-tree-settings 04259_filter_constant_column_after_where Tests: local `03305_mergine_aggregated_filter_push_down` smoke with `127.0.0.{1,1}` because this macOS host has no `127.0.0.2` loopback alias
Cover additional `FilterTransform` cases for filtered `ColumnConst` columns, including convertible constants, non-input expressions, `Nullable`, `LowCardinality`, omitted filtered columns, partially omitted equality columns, and empty results.
Avoid replacing floating-point columns after `equals`, because `+0.0` and `-0.0` compare equal but have different bit representations. Materialize `Nullable` constant columns before passing them downstream to avoid invalid keys in grouping sets with `group_by_use_nulls`. PR: ClickHouse#105541 Tests: ninja -C build programs/clickhouse Tests: tests/clickhouse-test --no-random-settings --no-random-merge-tree-settings 04259_filter_constant_column_after_where Tests: local `03305_mergine_aggregated_filter_push_down` smoke with `127.0.0.{1,1}` because this macOS host has no `127.0.0.2` loopback alias
Skip the constant-column rewrite when an `equals` predicate constrains a `Decimal` column with a floating-point constant. Such comparisons can keep multiple distinct `Decimal` values after conversion through floating-point comparison paths, so replacing the result with one converted constant changes query-visible output.
ae0a620 to
34f9cd9
Compare
Make the `equals` safety check recursive over constrained and constant types before replacing filtered columns with `ColumnConst`. This skips rewrites when the constrained type contains `Float`, or when it contains `Decimal` and the constant side contains `Float`. Add stateless coverage for nested `Tuple(Float64)` and nested `Tuple(Decimal128)` predicates that must preserve distinct surviving values. Review: ClickHouse#105541 (comment)
Skip the constant-column rewrite when the constrained column type contains `Dynamic`, because runtime variants are not visible through static child-type traversal and can carry non-representation-unique values.
`equals` between `String` and `FixedString` can keep multiple distinct `String` payloads because of zero-padded `FixedString` comparison semantics. Do not replace the filtered `String` column with a single `ColumnConst` in that case
`MergingAggregated` can receive partially aggregated blocks after expression or filter steps. Numeric aggregation merge methods read key columns through raw data, so a key can be read past its single nested value and produce wrong grouping keys. Materialize only key columns before merge and keep aggregate state columns unchanged.
d6403a1 to
73f7811
Compare
Fix `04259_filter_constant_column_after_where` by fixing `query_plan_merge_filters` and `query_plan_optimize_lazy_materialization` after the explicit filter-merge coverage queries. The test checks `dumpColumnStructure`, so randomized filter splitting and lazy materialization can change whether a column is observed as `ColumnConst` or a full column without changing query semantics.
The `equals` function for `Variant` can keep rows with different active variants for the same literal value, such as `Int8(1)` and `UInt8(1)` in `Variant(Int8, UInt8)`. Rewriting the constrained column to one `ColumnConst` would collapse those rows to a single active variant. Skip the optimization when the constrained result type recursively contains `Variant`, and cover both direct `Variant` and nested `Tuple(Variant)` cases. Review: ClickHouse#105541 (comment) Tests: `ninja -C build src/CMakeFiles/dbms.dir/Processors/Transforms/FilterTransform.cpp.o > build/build_filter_transform_variant_guard_object_20260525.log 2>&1`
|
📊 Cloud Performance Report 🟢 AI verdict: This PR adds optimize_constant_columns_after_filter, which replaces a column with a constant after an equality WHERE predicate proves all surviving values are equal, cutting downstream materialization and aggregation work. ClickBench Q28 (-5.58%) is a clean, consistent win and sits squarely on that path: it filters on several equality predicates ahead of a heavy GROUP BY. The remaining flags are off-path and were downgraded: ClickBench Q4/Q15 are plain aggregations with no equality filter for the optimization to act on, and TPC-H Q4 only has date ranges and a semi-join, so its +8.79% is borderline and noisy rather than a real PR effect. clickbench🟢 1 improved · Flagged queries (3 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_officialFlagged queries (1 of 22)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. Debug info
|
… and process enum missing value correctly.
Keep one semantic check for the `AND` case while preserving the existing type-specific coverage.
Materialize runtime `ColumnConst` keys before `DistinctSortedTransform` chooses a hash method. Numeric hash methods expect row-addressable key storage, while filtered constants can be represented by a single nested value. Add an in-order `DISTINCT` regression that drives a const key produced after `WHERE` through this path.
…onst-column-filter
Avoid replacing filtered date/time-family columns with `ColumnConst` when the equality constant uses a different date/time-family type. Those comparisons use common-type promotion, so keeping the original full column avoids depending on one-way constant conversion. Add a stateless regression covering mixed `Date`, `Date32`, `Time`, `Time64`, `DateTime`, and `DateTime64` equality filters.
…onst-column-filter
Runtime `equals` converts `Enum` values to their names when comparing against `String`, while `tryConvertFieldToType` would stringify the underlying enum number. Skip this rewrite so filtered `String` columns are not replaced with a different `ColumnConst` value.
…onst-column-filter
`FilterTransform` can now leave filtered columns as `ColumnConst`. Convert them to full columns at the `MergeTree` part writer boundary so projection part serialization receives physical columns such as `ColumnString`.
…onst-column-filter
Add a focused regression for post-filter constant columns entering `MergeTree` part writes, including filtered projection part serialization.
| if (!canReplaceColumnWithConstantAfterFilter(result_column.type, constant_node->result_type)) | ||
| return {}; | ||
|
|
||
| auto converted = tryConvertFieldToType(*constant_field, *result_column.type, constant_node->result_type.get()); |
There was a problem hiding this comment.
equals on string literals is settings-sensitive, but this rewrite recomputes the RHS with default FormatSettings instead of the same parser the comparison used. That is a wrong-result bug, not just a missed optimization.
One concrete repro is Bool: WHERE b = '0' SETTINGS bool_true_representation='0', bool_false_representation='1' keeps rows where b = true, because SerializationBool checks the custom true/false spellings before the built-in special values. Here tryConvertFieldToType runs with default settings, so it parses '0' as the built-in false and rewrites the surviving b column to ColumnConst(false).
03803_const_string_comparison_cast_settings.sql already shows that comparison semantics depend on per-query format settings (bool_*_representation, cast_string_to_date_time_mode, input_format_null_as_default). Please either pass the query FormatSettings/comparison settings into this conversion or skip the rewrite for string RHS constants unless you can reuse the exact same parsing path.
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 170/173 (98.27%) · Uncovered code |

Optimizes FilterTransform to replace physical input columns with
ColumnConstafter filters that prove the column has a single value, such asWHERE x = 'hello'. This reduces work for later pipeline steps and lets downstream expressions observe the filtered column as constant.The optimization currently handles equality predicates through
ANDconditions and preserves conservative behavior forOR,NOT, and multi-value predicates.This PR partially fixes issue: #34326. It covers the
ColumnConstoptimization for columns constrained by equality filters, while the remainingLowCardinalitypart is left out because changing a filtered column toLowCardinalitymay alter the column type/representation contract which seems more risky.Changelog category (leave one):
Changelog entry:
Performance benchmark
I measured this locally with a
RelWithDebInfobuild. Each case usesclickhouse benchmark --iterations 10; the table reports the median (50%) latency.group_by_const_keys_after_filterSELECT count(), sum(v) FROM constant_column_after_filter WHERE k1 = 42 AND k2 = 43 AND k3 = 44 AND k4 = 45 AND k5 = 46 AND k6 = 47 AND k7 = 48 AND k8 = 49 GROUP BY k1, k2, k3, k4, k5, k6, k7, k8 FORMAT Null SETTINGS max_threads = 1, optimize_move_to_prewhere = 0, query_plan_optimize_prewhere = 0, query_plan_optimize_lazy_materialization = 00.941 sec0.792 sec15.8%fasterexpressions_on_const_columns_after_filterSELECT sum(length(toString(k1))), sum(length(toString(k2))), sum(length(toString(k3))), sum(length(toString(k4))), sum(length(toString(k5))), sum(length(toString(k6))), sum(length(toString(k7))), sum(length(toString(k8))) FROM constant_column_after_filter WHERE k1 = 42 AND k2 = 43 AND k3 = 44 AND k4 = 45 AND k5 = 46 AND k6 = 47 AND k7 = 48 AND k8 = 49 FORMAT Null SETTINGS max_threads = 1, optimize_move_to_prewhere = 0, query_plan_optimize_prewhere = 0, query_plan_optimize_lazy_materialization = 00.768 sec0.434 sec43.5%fasterNote
Medium Risk
Touches core query execution filtering and mutates output column representations, which could subtly affect type/nullable/LC behavior; additional guards reduce risk but correctness/perf impacts span many queries.
Overview
Filter optimization:
FilterTransformnow analyzes the filterActionsDAGand, forand-combinedequalspredicates against constants (e.g.x = 'hello'), records columns that become single-valued and rewrites those output columns toColumnConstafter filtering.The rewrite is applied on fast paths (all-rows-pass / constant-true filter) and after partial filtering, with safeguards to skip float-related cases and to avoid unsafe conversions (notably float/decimal comparisons), plus handling for nullable/low-cardinality-nullable columns.
Adds a new stateless test (
04259_filter_constant_column_after_where) covering aliases, multipleANDequalities, non-handledOR/NOT, nullable/low-cardinality cases, and float/decimal edge conditions.Reviewed by Cursor Bugbot for commit 9a6d76a. Bugbot is set up for automated code reviews on this repo. Configure here.