Fix logical error in correlated subqueries with group_by_use_nulls by coderashed · Pull Request #108714 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix logical error in correlated subqueries with group_by_use_nulls#108714

Closed
coderashed wants to merge 2 commits into
ClickHouse:masterfrom
coderashed:coderashed/fix-correlated-subquery-group-by-nulls
Closed

Fix logical error in correlated subqueries with group_by_use_nulls#108714
coderashed wants to merge 2 commits into
ClickHouse:masterfrom
coderashed:coderashed/fix-correlated-subquery-group-by-nulls

Conversation

@coderashed

@coderashed coderashed commented Jun 27, 2026

Copy link
Copy Markdown

Closes: #107445
Closes: #106377
Related: #107951
Related: #108685
Related: #104350

Fixes a logical error in correlated scalar subqueries when an outer GROUP BY key is made Nullable by group_by_use_nulls (through GROUPING SETS / ROLLUP / CUBE / WITH TOTALS).

A correlated column referencing such an outer key was left non-Nullable by the analyzer: the group_by_use_nulls nullability walk in resolveExpressionNode stops at the subquery's own query scope and never reaches the outer scope that owns the key. At decorrelation time the plan feeds in the real Nullable column, so a baked function result type (e.g. toString -> String) no longer matches the actual Nullable(String), aborting with:

Logical error: Unexpected return type from toString. Expected String. Got Nullable(String)

The fix lets the scope walk continue past a subquery's query boundary for correlated columns and converts the correlated column to Nullable in place, preserving the node identity shared with the subquery's correlated-columns set (which the planner matches by identity).

How it was found

Discovered by the AST fuzzer in the amd_msan stress test of #108685: https://github.com/ClickHouse/ClickHouse/actions/runs/28286443407/job/83839128485?pr=108685

The exact query the fuzzer aborted on (seeded from 02122_parallel_formatting_CSV):

SELECT number, number + 1, concat('string: ', (SELECT toString(number)))
FROM numbers(200000) GROUP BY GROUPING SETS ((number)) WITH TOTALS
ORDER BY number ASC LIMIT 190000 FORMAT CSV;

Minimal reproducer:

SELECT concat('s', (SELECT toString(number)))
FROM numbers(5)
GROUP BY GROUPING SETS ((number))
SETTINGS group_by_use_nulls = 1;

Testing

  • New stateless test 04413_correlated_subquery_group_by_use_nulls covers GROUPING SETS / ROLLUP / CUBE / WITH TOTALS, asserts the correlated result is Nullable, checks a non-correlated subquery is unaffected, and includes the exact (scaled-down) query shape from the CI failure above.
  • Existing *correlated*, *group_by_use_nulls* and *grouping_sets* stateless tests pass locally with the fix.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a logical error (Unexpected return type ...) when a correlated scalar subquery references an outer GROUP BY key that becomes Nullable due to group_by_use_nulls (GROUPING SETS / ROLLUP / CUBE / WITH TOTALS).

A correlated scalar subquery referencing an outer `GROUP BY` key that
becomes `Nullable` via `group_by_use_nulls` (`GROUPING SETS` / `ROLLUP` /
`CUBE` / `WITH TOTALS`) aborted with:

  Logical error: Unexpected return type from toString. Expected String. Got Nullable(String)

The `group_by_use_nulls` nullability walk in `resolveExpressionNode`
stops at the subquery's own query scope, so a correlated column kept its
non-`Nullable` type while decorrelation fed in the real `Nullable`
column, mismatching a baked function result type (`toString` -> `String`
vs `Nullable(String)`).

Continue the walk past a subquery's query boundary for correlated
columns and convert the column to `Nullable` in place, preserving the
node identity shared with the subquery's correlated-columns set (which
the planner matches by identity).

Discovered by the AST fuzzer in the `amd_msan` stress test:
https://github.com/ClickHouse/ClickHouse/actions/runs/28286443407/job/83839128485?pr=108685
Related: ClickHouse#108685
Add the query shape the AST fuzzer aborted on in the `amd_msan` stress
test of ClickHouse#108685 (multi-column
projection with a correlated scalar subquery over a `GROUPING SETS` key
under `WITH TOTALS`), scaled down and deterministic, so the regression
test ties directly to the reported failure.
@coderashed

Copy link
Copy Markdown
Author

@coderashed coderashed closed this Jun 27, 2026
@groeneai

Copy link
Copy Markdown
Contributor

groeneai added a commit to groeneai/ClickHouse that referenced this pull request Jun 30, 2026
Two test cases contributed by @coderashed (who closed his duplicate PR
ClickHouse#108714 in favour of this one):

1. The exact AST-fuzzer seed query from ClickHouse#108714: a correlated scalar
   subquery toString(number) inside concat() over a GROUPING SETS key.
   Aborts on master with 'Bad cast ColumnNullable to ColumnString',
   returns s0..s4 with the fix.

2. A type-pinning EXPLAIN QUERY TREE check. The existing cases assert
   query results, which still pass if a future change silently stops
   wrapping the correlated key as Nullable (values unchanged, declared
   type wrong). These pin the analyzer-assigned types via
   SELECT ... FROM (EXPLAIN QUERY TREE ...) WHERE explain ILIKE '%...%',
   so dropping the Nullable wrapping fails with a clean diff.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants