Support group_by_use_nulls in WITH TOTALS by groeneai · Pull Request #107106 · ClickHouse/ClickHouse · GitHub
Skip to content

Support group_by_use_nulls in WITH TOTALS#107106

Open
groeneai wants to merge 3 commits into
ClickHouse:masterfrom
groeneai:groeneai/rebase-39574-group-by-use-nulls-totals
Open

Support group_by_use_nulls in WITH TOTALS#107106
groeneai wants to merge 3 commits into
ClickHouse:masterfrom
groeneai:groeneai/rebase-39574-group-by-use-nulls-totals

Conversation

@groeneai

@groeneai groeneai commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

GROUP BY ... WITH TOTALS now respects the group_by_use_nulls setting when the analyzer is enabled (enable_analyzer = 1, the default): the GROUP BY keys in the TOTALS summary row are reported as NULL instead of their default values, consistent with GROUPING SETS, ROLLUP and CUBE. Keys whose types cannot be wrapped in Nullable (for example Array or Map) keep their default value. With the old query analyzer (enable_analyzer = 0) the keys keep their default value as before.

Description

Rebase and port of #39574 (original author @ novikd) on top of current master.

With group_by_use_nulls = 1, GROUP BY keys are already reported as Nullable for GROUPING SETS, ROLLUP and CUBE, but plain GROUP BY ... WITH TOTALS was left out: the keys stayed non-nullable and the TOTALS summary row showed 0 (the default value) instead of NULL.

The original PR fixed this in the old interpreter by converting the keys inside AggregatingStep. The analyzer is now the default, so this ports the fix to the analyzer planner:

  • PlannerExpressionAnalysis: include WITH TOTALS in the group_by_use_nulls condition so the result type of the keys is analyzed as Nullable.
  • Planner: insert a toNullable conversion for the GROUP BY keys right before the TotalsHaving step. The TOTALS row is filled from default values, so a Nullable key column yields NULL there. This also covers CUBE/ROLLUP WITH TOTALS, whose own nullable conversion happens after TotalsHaving and would otherwise leave the TOTALS row non-NULL. Keys that cannot be made Nullable are skipped.

Example (before this change the last row was 0):

SELECT number FROM numbers(10) GROUP BY number WITH TOTALS ORDER BY number SETTINGS group_by_use_nulls = 1;
-- 0 .. 9
--
-- \N

The behavior is analyzer-only, so the affected tests set enable_analyzer = 1, and the documentation/changelog qualify the behavior as applying only with the analyzer. Updated tests: 02343_group_by_use_nulls, 02535_analyzer_group_by_use_nulls, 03912_tuple_comparison_group_by_use_nulls, and the new 02373_group_by_use_nulls_with_totals (from the original PR).

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @novikd @vdimir — could you review this? It rebases #39574 and ports the group_by_use_nulls + WITH TOTALS fix to the analyzer: the TOTALS summary row now reports GROUP BY keys as NULL (matching GROUPING SETS/ROLLUP/CUBE) by typing the keys Nullable in PlannerExpressionAnalysis and adding a toNullable conversion just before the TotalsHaving step. Original authorship credited to @novikd in the commit.

@pufit pufit added the can be tested Allows running workflows for external contributors label Jun 11, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [2c47bef]

Summary:

job_name test_name status info comment
AST fuzzer (amd_debug, targeted) FAIL
Logical error: Bad cast from type A to B (STID: 1499-3ec4) FAIL cidb

AI Review

Summary

This PR makes analyzer plans for GROUP BY ... WITH TOTALS honor group_by_use_nulls by aligning query-tree key typing with planner output and converting real grouping keys before TotalsHaving. The current patch addresses the prior review concerns around IdentifierResolveScope, GROUP BY ALL, folded constants, non-Nullable key types, and analyzer-only documentation; I did not find remaining blockers or major issues.

Missing context / blind spots
  • ⚠️ Full CI was still pending at review time for SHA 2c47befaee5fcfcd800e2de8dda898a549e128c0; the available ClickHouse CI report showed no failed tests yet.
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Jun 11, 2026
Comment thread src/Planner/PlannerExpressionAnalysis.cpp
Comment thread src/Planner/Planner.cpp
@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch 2 times, most recently from 7507ca2 to a54354e Compare June 11, 2026 02:28
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (updated after addressing review)

# Question Answer
a Deterministic repro? Yes. SELECT number AS k, k IS NULL, count() FROM numbers(3) GROUP BY k WITH TOTALS ORDER BY k SETTINGS group_by_use_nulls = 1, enable_analyzer = 1 deterministically produced \N 0 3 in the TOTALS row before the fix (and toTypeName(k) reported UInt64); the original symptom was the TOTALS key printing 0 instead of NULL.
b Root cause explained? Two gates select which GROUP BY modifiers get group_by_use_nulls handling and both listed only GROUPING SETS / ROLLUP / CUBE. (1) The planner never typed plain-TOTALS keys Nullable nor inserted a conversion, so the TOTALS row (built from insertDefaultInto) emitted 0. (2) IdentifierResolveScope never enabled group_by_use_nulls for plain TOTALS, so the query tree resolved keys and expressions over them as non-nullable, making k IS NULL fold to 0 and toTypeName(k) report the non-nullable type.
c Fix matches root cause? Yes. Adds isGroupByWithTotals() to both gates (IdentifierResolveScope and PlannerExpressionAnalysis) and inserts a toNullable conversion of the GROUP BY keys right before TotalsHaving in Planner. The analyzer types the contract, the planner makes the runtime data match it.
d Test intent preserved / new tests added? New test 02373_group_by_use_nulls_with_totals covers plain / mixed / constant-key cases including type observation via k IS NULL. References for 02343, 02535, 03912 updated to the corrected TOTALS rows. No existing assertions weakened.
e Both directions demonstrated? Yes. Unpatched master build: TOTALS row 0 / k IS NULL = 0 / toTypeName = UInt64. Patched build: TOTALS row \N / k IS NULL = 1 / toTypeName = Nullable(UInt64), matching CUBE/ROLLUP WITH TOTALS.
f Fix is general, not a narrow patch? Yes. The fix targets the two shared gates that all GROUP BY modifiers flow through, so it stays consistent with the existing GROUPING SETS / ROLLUP / CUBE handling rather than being a one-off. A/B run of every local-runnable totals/rollup/cube/grouping_sets stateless test against an unpatched master build: 121 identical, only the 4 intended group_by_use_nulls + WITH TOTALS tests differ. Folded constant keys are documented as intentionally outside the NULL contract (uniform with the other modifiers).

Session id: cron:clickhouse-worker-slot-15:20260611-020200

@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch from a54354e to a458d3e Compare June 11, 2026 03:39
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Re-run for the test coverage update (a458d3e), which adds expression-level cases to 02373_group_by_use_nulls_with_totals per the review comment.

# Question Answer
a Deterministic repro? Yes. On a build with only the IdentifierResolveScope hunk reverted, clickhouse-test 02373_group_by_use_nulls_with_totals fails deterministically; the diff is confined to the new expression-level lines (toTypeName(k): Nullable(UInt64)UInt64, k IS NULL in the TOTALS row: 10).
b Root cause explained? The fix has two layers: the Planner step inserts a runtime toNullable so the TOTALS row carries a NULL value, and IdentifierResolveScope/PlannerExpressionAnalysis type the key as Nullable in the query tree. The old test only asserted the raw value, which the Planner layer alone produces, so the query-tree typing layer was unprotected: toTypeName(k) and k IS NULL are resolved against the analyzed type, so without the scope change they report the non-nullable type and fold IS NULL to 0.
c Fix matches root cause? The change is test-only. It adds toTypeName(k), k IS NULL, and a wrapping-subquery case that depend on the analyzed (query-tree) type, directly covering the IdentifierResolveScope layer the bot flagged. No source change.
d Test intent preserved / new tests added? Preserved and extended. The pre-existing raw-value, pure-constant-key, and mixed-key cases are unchanged; three expression-level assertions are added on top.
e Both directions demonstrated? Yes. With the full fix: 30/30 passes under --test-runs 30 (randomization on) and 20/20 on a follow-up run. With only the IdentifierResolveScope hunk reverted (Planner + PlannerExpressionAnalysis kept): the runner reports [ FAIL ] with the diff isolated to the new expression-level lines, while the raw-value lines still match. Binary identity verified by Build ID (fix 706cc63… vs reverted a38aebcd…).
f Fix is general, not a narrow patch? N/A for the source fix in this round (unchanged). The underlying fix already covers all three layers (scope typing, expression analysis result type, planner runtime conversion) and the analogous GROUPING SETS/ROLLUP/CUBE paths; this round only adds the regression test that guards the scope-typing layer.

Session id: cron:clickhouse-worker-slot-31:20260611-030000

Comment thread src/Planner/PlannerExpressionAnalysis.cpp
@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch from a458d3e to 878b0ce Compare June 11, 2026 04:28
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (docs follow-up, head 878b0ce)

This commit is a docs-only follow-up to the clickhouse-gh[bot] change-request: align the user-facing docs with the new WITH TOTALS + group_by_use_nulls behavior. No code logic changed (only the Settings.cpp raw-string setting description and a Markdown doc).

# Question Answer
a Deterministic repro? N/A (docs-only change; no runtime behavior altered). The behavior being documented is already deterministically covered by test 02373_group_by_use_nulls_with_totals (raw key value, pure constant key, mixed real+constant key, toTypeName(k)/k IS NULL, wrapping subquery).
b Root cause explained? The PR makes real GROUP BY keys Nullable and reported as NULL in the WITH TOTALS summary row under group_by_use_nulls = 1, but the docs still described the old contract: the group_by_use_nulls setting text in src/Core/Settings.cpp listed only ROLLUP/CUBE/GROUPING SETS, and the WITH TOTALS section in group-by.md still said key columns hold default values.
c Fix matches root cause? Yes. Both flagged docs now state the new behavior and the folded-constant-key exception: Settings.cpp adds WITH TOTALS to the modifier list plus a sentence on the NULL/Nullable totals-row behavior; group-by.md WITH TOTALS section adds the same with a link to the setting.
d Test intent preserved / new tests added? N/A for docs. The doc statements match existing test 02373 (constant key keeps its value; real key becomes NULL). No test change in this follow-up.
e Both directions demonstrated? N/A (docs-only). Verified src/Core/Settings.cpp still compiles after editing the raw-string description (src/CMakeFiles/dbms.dir/Core/Settings.cpp.o built clean).
f Fix is general, not a narrow patch? Yes. Audited all docs mentioning WITH TOTALS/group_by_use_nulls: settings.md has no separate prose (auto-generated from Settings.cpp); the format docs (Pretty/JSON/TabSeparated/Template, index.md) only describe totals-row output formatting, not the key-column value contract. The two places that describe that contract are exactly the two updated.

Session id: cron:clickhouse-worker-slot-4:20260611-042300

Comment thread src/Analyzer/Resolve/IdentifierResolveScope.cpp
@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch from 878b0ce to a7eba4d Compare June 11, 2026 06:05
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

For the GROUP BY ALL WITH TOTALS fix (commit a7eba4d):

# Question Answer
a Deterministic repro? Yes. SELECT k, toTypeName(k) FROM (SELECT number AS k FROM numbers(3) GROUP BY ALL WITH TOTALS) WHERE k IS NULL SETTINGS group_by_use_nulls = 1, enable_analyzer = 1; threw NOT_AN_AGGREGATE on every run on the PR head before this fix; SELECT number AS k, toTypeName(k), k IS NULL FROM numbers(3) GROUP BY ALL WITH TOTALS ORDER BY k SETTINGS group_by_use_nulls = 1 reproduced it directly.
b Root cause explained? Yes. With group_by_use_nulls, resolveQuery skips early projection resolution and resolveGroupByNode does not run for GROUP BY ALL (its key list is empty until expandGroupByAll, which runs near the end of resolveQuery). So no keys are registered in nullable_group_by_keys, the projection is typed non-nullable, and validateAggregates compares it against keys it converted to Nullable, raising NOT_AN_AGGREGATE. The new isGroupByWithTotals gate routed ordinary WITH TOTALS (incl. GROUP BY ALL) onto this delayed-resolution path, exposing the gap.
c Fix matches root cause? Yes. resolveQuery now expands GROUP BY ALL before resolveGroupByNode registers the nullable keys when group_by_use_nulls is set, so the keys are registered and the projection is typed Nullable. No widened bounds or suppression tags.
d Test intent preserved / new tests added? Yes. Added two GROUP BY ALL WITH TOTALS cases to 02373_group_by_use_nulls_with_totals (the bot's wrapping subquery, and toTypeName(k) / k IS NULL), guarding the expansion-order fix. No existing assertions weakened.
e Both directions demonstrated? Yes. PR head before fix: NOT_AN_AGGREGATE. With fix: \N / Nullable(UInt64), matching explicit GROUP BY k WITH TOTALS. 02373 matches its reference exactly, 30/30 with randomization; sibling tests 02343/02535/03912 and GROUP BY ALL suite tests 02459/03298/03413/03453 still pass.
f Fix is general, not a narrow patch? Yes. Fix is at the resolution-order source, not a guard at the symptom. It covers the whole GROUP BY ALL modifier family: GROUP BY ALL WITH ROLLUP/CUBE + group_by_use_nulls (which hit the same NOT_AN_AGGREGATE even on master) now also work. Explicit GROUP BY and group_by_use_nulls = 0 paths are unchanged.

Session id: cron:clickhouse-worker-slot-8:20260611-051100

Comment thread docs/en/sql-reference/statements/select/group-by.md Outdated
@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch from a7eba4d to 92e774a Compare June 11, 2026 06:43
Rebase and port of ClickHouse#39574 (original author @novikd) to the analyzer.

With group_by_use_nulls enabled, GROUP BY keys are reported as Nullable
for GROUPING SETS, ROLLUP and CUBE, but plain GROUP BY ... WITH TOTALS was
left out: the keys stayed non-nullable and the TOTALS summary row showed 0
instead of NULL.

The original PR fixed this in the old interpreter by converting the keys
inside AggregatingStep. The analyzer is now the default and needs the same
behavior, so this ports the fix to the analyzer in two places that gate the
group_by_use_nulls handling on the GROUP BY modifier and previously listed
only GROUPING SETS / ROLLUP / CUBE:

* IdentifierResolveScope: include WITH TOTALS so the query tree resolves the
  GROUP BY keys (and expressions over them) as Nullable. Without this the
  planner-side conversion below produces a Nullable key column at runtime
  while query-tree analysis still treats the key as non-nullable, e.g.
  'k IS NULL' over the TOTALS row folds to 0 and toTypeName(k) reports the
  non-nullable type, and a wrapping subquery sees the wrong key type.
* PlannerExpressionAnalysis: include WITH TOTALS so the result type of the
  keys is analyzed as Nullable, consistent with the scope above.
* Planner: insert a toNullable conversion for the GROUP BY keys right before
  the TotalsHaving step. The TOTALS row is built from default values, so a
  Nullable key column yields NULL there. This runs for CUBE/ROLLUP WITH
  TOTALS as well, because their own nullable conversion happens after
  TotalsHaving and would otherwise leave the TOTALS row non-NULL. Keys that
  cannot be made Nullable are skipped.

GROUP BY ALL needs an extra step. Its key list is empty until expandGroupByAll
collects the keys from the projection, and that normally runs near the end of
resolveQuery, after resolveGroupByNode is the only place that registers keys
in nullable_group_by_keys. So with the new WITH TOTALS gate, GROUP BY ALL took
the delayed-resolution path with no registered keys: the projection was typed
non-nullable while validateAggregates compared it against keys it had converted
to Nullable, raising NOT_AN_AGGREGATE. resolveQuery now expands GROUP BY ALL
before resolveGroupByNode when group_by_use_nulls is set. The expansion uses a
throwaway clone of the projection so the real projection keeps its aliases and
is not constant-folded against the pre-nullable types; the clone's resolved key
nodes still match the real keys structurally, because nullable_group_by_keys
ignores aliases and types. GROUP BY ALL WITH TOTALS then behaves exactly like
the explicit GROUP BY case, and GROUP BY ALL WITH ROLLUP/CUBE (which had the
same NOT_AN_AGGREGATE error before) works too.

Constant GROUP BY keys are folded away before aggregation and are not real
key columns, so they keep their constant value in the TOTALS row instead of
becoming NULL. This is the same as CUBE/ROLLUP/GROUPING SETS WITH TOTALS,
where a folded constant key likewise stays constant in the summary rows.

02373 covers the raw key value in the TOTALS row, a pure constant key, and a
mixed real plus constant key. It also covers the expression level so the
IdentifierResolveScope typing is protected: toTypeName(k) over the WITH
TOTALS result, 'k IS NULL' which is 1 in the TOTALS row, and a wrapping
subquery that selects the TOTALS row with WHERE k IS NULL. Without the
IdentifierResolveScope change these report the non-nullable type and fold
'k IS NULL' to 0 while the raw value cases still pass, so they are the
regression guard for that part of the fix. The same expression-level cases
are added for GROUP BY ALL WITH TOTALS as the regression guard for the
expansion-order fix above.

Documentation: the group_by_use_nulls setting text in src/Core/Settings.cpp
and the WITH TOTALS section in docs/en/sql-reference/statements/select/group-by.md
now state that with group_by_use_nulls = 1 the real GROUP BY keys in the WITH
TOTALS row are reported as NULL, qualified to only keys whose types can be
wrapped in Nullable. Keys whose types cannot be made Nullable (for example
Array and Map, which fail canBeInsideNullable) keep their default value, as
does a folded constant key.

The behavior is analyzer-only; the affected tests set enable_analyzer = 1.

Co-authored-by: Dmitry Novik <novikd@clickhouse.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai force-pushed the groeneai/rebase-39574-group-by-use-nulls-totals branch from 92e774a to 0d313cb Compare June 11, 2026 06:44
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

This round addresses the clickhouse-gh[bot] change-request on docs/.../select/group-by.md (review comment 3393695403): the docs over-promised that every real GROUP BY key becomes NULL in the totals row. Docs-only delta, no behavior change.

# Question Answer
a Deterministic repro? Yes (behavioral confirmation of the gap the docs mis-stated). Array/Map keys with group_by_use_nulls=1, enable_analyzer=1: SELECT [number] AS k, toTypeName(k) FROM numbers(3) GROUP BY k WITH TOTALS -> totals row [] Array(UInt64) (default, not NULL) on every run; scalar UInt64 key -> \N Nullable(UInt64).
b Root cause explained? addConvertGroupByKeysToNullableStep (src/Planner/Planner.cpp:843) only inserts toNullable for keys where removeLowCardinality(type)->canBeInsideNullable() is true. DataTypeArray::canBeInsideNullable() and DataTypeMap::canBeInsideNullable() return false, so those keys keep their default in the totals row. The docs claimed all real keys become NULL, which is inaccurate for these types.
c Fix matches root cause? Yes. Docs now state only keys whose types can be wrapped in Nullable are converted/reported as NULL; Array/Map (and other non-canBeInsideNullable types) keep their default value. Matches the canBeInsideNullable gate exactly.
d Test intent preserved / new tests added? N/A (docs-only). The runtime behavior is unchanged and already covered by 02373_group_by_use_nulls_with_totals for the convertible-key case. No code change, so no new regression test needed.
e Both directions demonstrated? N/A bidirectional (docs-only, no behavior change). Empirically verified the documented behavior matches the implementation (Array/Map keep default; scalar keys become NULL), shown in (a).
f Fix is general, not a narrow patch? Yes. Both places stating the totals-row key-value contract are qualified: docs/en/sql-reference/statements/select/group-by.md (WITH TOTALS section) and src/Core/Settings.cpp (group_by_use_nulls text, from which settings.md is auto-generated). Sibling audit: the format docs (Pretty/JSON/TabSeparated/Template, select/index.md) describe only totals-row output formatting, not the key value contract, so no further change needed.

Session id: cron:clickhouse-worker-slot-2:20260611-063900

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on 0d313cb726ca. The single non-passing functional check is a known-flaky test unrelated to this PR:

  • Integration tests (amd_tsan, 6/6) / test_profile_max_sessions_for_user_postgres - chronic flaky (15 distinct PRs + 2 master hits in the last 30 days). This PR is analyzer-only (group_by_use_nulls + WITH TOTALS) and does not touch session limits or PostgreSQL integration. Tracked separately; a dedicated fix PR (Fix flaky timeout in test_profile_max_sessions_for_user (postgres) #107086) is already in review. Not caused by this PR.

The remaining red marks (Mergeable Check, PR) are derivative aggregators of the above. All other checks pass.

Session: cron:our-pr-ci-monitor:20260611-120000

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished on HEAD 0d313cb726ca (Finish Workflow pass). The single failure — test_profile_max_sessions_for_user/test.py::test_profile_max_sessions_for_user_postgres in Integration tests (amd_tsan, 6/6) — is a known chronic flaky timeout (CIDB: 16 hits across 15 PRs, incl. 2 master, in 30d), unrelated to this group_by_use_nulls / WITH TOTALS change. It is already tracked (a separate fix PR is in flight). No PR-caused failures. Moving to review.

Session id: cron:our-pr-ci-monitor:20260611-160000

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, as the bad integration test is fixed, update the branch in this PR, so we will have a chance to merge it.

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov done. Branch updated to the latest master (merged 369a1269e02c); PR head is now 259fa05035ed. Clean merge, no conflicts; the 3-part analyzer/planner fix and the WITH TOTALS / GROUP BY ALL test cases are intact. Builds clean (debug, clang-21) and the group_by_use_nulls + WITH TOTALS / GROUP BY ALL repros and tests pass locally. CI is re-running.

Comment thread docs/en/sql-reference/statements/select/group-by.md Outdated
The group_by_use_nulls conversion for WITH TOTALS keys is implemented only
in the analyzer planner (PlannerExpressionAnalysis / Planner). With the old
query interpreter (enable_analyzer = 0) the keys are not converted, and the
totals row keeps the default key value. Qualify the group-by.md WITH TOTALS
section and the group_by_use_nulls setting text in Settings.cpp so the
documented public contract matches the analyzer-only behavior.

Verified: enable_analyzer=1 totals key = NULL / Nullable(UInt64);
enable_analyzer=0 totals key = 0 / UInt64 (unchanged).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Docs/changelog-only change (qualify WITH TOTALS + group_by_use_nulls as analyzer-only) addressing the clickhouse-gh[bot] review on group-by.md.

# Question Answer
a Deterministic repro? Yes. SELECT number FROM numbers(3) GROUP BY number WITH TOTALS SETTINGS group_by_use_nulls = 1, enable_analyzer = 0 deterministically returns the default key 0 (UInt64) in the totals row, while enable_analyzer = 1 returns NULL (Nullable(UInt64)). The docs previously promised NULL unconditionally.
b Root cause explained? Yes. The group_by_use_nulls totals conversion is implemented only in the analyzer planner (PlannerExpressionAnalysis result typing + Planner toNullable before TotalsHaving). With enable_analyzer = 0, execution goes through InterpreterSelectQuery, which applies the conversion only for grouping sets, so plain WITH TOTALS keeps the non-nullable default key. The docs/changelog described the analyzer behavior as unconditional.
c Fix matches root cause? Yes. Qualifies the public contract (group-by.md, Settings.cpp group_by_use_nulls text, changelog entry) to state the conversion applies with enable_analyzer = 1 (the default) and that the old interpreter keeps the default value. The implementation scope is unchanged; the docs now match it.
d Test intent preserved / new tests added? N/A for new tests (docs/changelog-only). Existing analyzer coverage in 02373_group_by_use_nulls_with_totals (pinned enable_analyzer = 1) is unchanged and still protects the analyzer behavior the docs now describe.
e Both directions demonstrated? N/A (no behavior change). The documented contract was verified empirically in both directions on a local debug build (Build ID 287c79c3): analyzer=1 totals key NULL/Nullable(UInt64); analyzer=0 totals key 0/UInt64.
f Fix is general, not a narrow patch? Yes. Sibling-path audit: the totals-row key-value contract is described only in group-by.md and the auto-generated group_by_use_nulls setting text (from Settings.cpp); settings.md is generated from Settings.cpp so no manual edit there. Format docs (Pretty/JSON/TabSeparated, select/index.md) describe only totals-row output formatting, not the key value contract. Both contract places + the changelog were qualified.

Session id: cron:clickhouse-worker-slot-1:20260613-144000

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov Branch is on the latest master. CI is now green except AST fuzzer (amd_debug, targeted), and that failure is a pre-existing bug, not introduced by this PR.

The fuzzer hit Logical error: 'Bad cast from type DB::ColumnNullable to DB::ColumnVector<unsigned long>' on a correlated scalar subquery that references an outer GROUP BY key. That is issue #91119: when group_by_use_nulls = 1 makes a GROUP BY key Nullable, a correlated subquery referencing that key keeps the key's non-Nullable type in the analyzer, but decorrelation later feeds it the post-aggregation Nullable column, so the aggregate built for UInt64 asserts on a ColumnNullable. This PR does not cause it: it only adds WITH TOTALS as another way to make the key Nullable. The same crash reproduces on master today for WITH ROLLUP/CUBE/GROUPING SETS.

Evidence:

#104350 ("Fix correlated subquery + GROUP BY ROLLUP under group_by_use_nulls", issue #91119) is the dedicated fix and is modifier-agnostic. Folding that analyzer change into this PR would duplicate and conflict with #104350 (both edit the same nullable_group_by_keys scope walk).

My recommendation is to treat this fuzzer finding as #91119 (tracked, fix in #104350) and not block this PR on it, or to land #104350 first. How would you like to proceed?

@groeneai

Copy link
Copy Markdown
Contributor Author

@novikd
The AST fuzzer LOGICAL_ERROR on this PR is a pre-existing master bug in correlated-subquery decorrelation, not a regression from this change. It needs a scope decision.

Crash: Bad cast from type ColumnNullable to ColumnVector<UInt64> in the aggregator (AggregateFunctionDistinct/sum add()assert_cast<ColumnVector<UInt64>> on a ColumnNullable).

Minimal repro (this PR, WITH TOTALS):

SELECT (SELECT sum(number)), number FROM numbers(10) GROUP BY number WITH TOTALS SETTINGS group_by_use_nulls = 1;

Same crash on pure master (no commits from this PR), CUBE/ROLLUP:

SELECT (SELECT sum(number)), number FROM numbers(10) GROUP BY CUBE(number)   SETTINGS group_by_use_nulls = 1;
SELECT (SELECT sum(number)), number FROM numbers(10) GROUP BY ROLLUP(number) SETTINGS group_by_use_nulls = 1;

Verified on master 40f84e4 (debug). GROUPING SETS crashes the same way. count() (no argument) and the aggregate written directly without a subquery both pass; the trigger is specifically a correlated subquery whose aggregate takes the nullified GROUP BY key as an argument. allow_experimental_correlated_subqueries = 0 avoids it.

Root cause: decorrelateQueryPlan (src/Planner/PlannerCorrelatedSubqueries.cpp:205-206) binds the correlated column from the outer plan's current header, i.e. after the AggregatingStep where group_by_use_nulls has already converted the key to Nullable(UInt64). That nullable column is fed as the join key into the decorrelated subquery's own AggregatingStep (:368-372), whose sum/sumDistinct were resolved as sum(UInt64)→UInt64, so the runtime assert_cast to the non-nullable column aborts. This PR doesn't add the bug; extending group_by_use_nulls to WITH TOTALS just makes WITH TOTALS reach the same latent decorrelation path that CUBE/ROLLUP/GROUPING SETS already hit. CIDB shows only 2 such failures in 90 days (rare shape); no existing issue.

Since you own the decorrelation code and this feature, your call on scope:

a. Land WITH TOTALS here and fix the general decorrelation + group_by_use_nulls bug separately (it covers CUBE/ROLLUP/GROUPING SETS too).
b. Fix it in this PR — convert/match the decorrelated aggregate's argument type to the nullified correlated key at PlannerCorrelatedSubqueries.cpp:205-206. There's a semantic question for the super-aggregate NULL-key row (what the correlated aggregate should see), which is why I'd rather not pick the contract unilaterally.
c. Gate WITH TOTALS + group_by_use_nulls behind the same path so it can't reach decorrelation until the engine fix lands.

I have the repro and the type trace ready and can implement whichever you prefer.

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 56/60 (93.33%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants