Fix JIT crash on arithmetic in correlated subquery with group_by_use_nulls#108558
Fix JIT crash on arithmetic in correlated subquery with group_by_use_nulls#108558groeneai wants to merge 1 commit into
Conversation
A correlated scalar subquery whose body contains an arithmetic expression over the outer GROUP BY key, evaluated under group_by_use_nulls with WITH CUBE/ROLLUP, produces an internally inconsistent ActionsDAG: decorrelation feeds a Nullable correlated-column INPUT into minus/plus/etc. nodes that were resolved (and keep their result type) for the non-Nullable key. The interpreter tolerates this, but the expression JIT bakes the declared node types and aborts in llvm::BinaryOperator::Create with "Cannot create binary operator with two operands of differing type" (found by the AST fuzzer). Guard isCompilableFunction against this: when a child's actual result type differs from the type the function was resolved for (function.getArgumentTypes()[i]), the node cannot be faithfully JIT-compiled, so skip JIT for it and let it run in the interpreter, which produces the correct result. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @KochetovNicolai @novikd — could you review this? It fixes an AST-fuzzer JIT crash ( |
| @@ -388,6 +388,21 @@ static bool isCompilableFunction(const ActionsDAG::Node & node, const std::unord | |||
| { | |||
| return false; | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
@Algunenano You are right, same mistake as #107263. The JIT guard is the wrong layer.
Root cause is the caller: under group_by_use_nulls, decorrelation feeds the correlated subquery INPUT as Nullable(UInt64) (from the outer Nullable GROUP BY key), but the consuming minus nodes keep the non-nullable Int64 they were resolved for. The ActionsDAG is internally inconsistent. The interpreter tolerates it, the JIT bakes the declared type and aborts. This is the #106377 family.
It is already being fixed in the right layer: the comprehensive analyzer fix #100365 makes a correlated column Nullable at resolution time when it references an outer Nullable group-by key, so the subquery functions are rebuilt as Nullable(Int64) from the start and the DAG stays consistent. Its own code comment describes exactly this shape ("the inner expression DAG, function bindings ... built with the pre-Nullable type and later fail with a type mismatch at runtime"). #104350 is related. novikd already directed on #102597 that this belongs in QueryAnalyzer, not the planner or JIT.
Closing this PR. I will not add a competing fix. Adding to #106377 the JIT-abort reproducer and a sibling JIT-independent variant (LOGICAL_ERROR: Unexpected return type from toUInt32 ... Got Nullable(UInt32), reproduces with compile_expressions=0) so the analyzer fix is validated against both shapes.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a crash (
Cannot create binary operator with two operands of differing type) when JIT expression compilation (compile_expressions=1) is used for a correlated scalar subquery whose body has an arithmetic expression over the outerGROUP BYkey, evaluated withgroup_by_use_nulls=1andWITH CUBE/ROLLUP.Description
Found by the AST fuzzer (STID 1176-28cc),
AST fuzzer (amd_debug).CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/107318/7aa7dc7aa719bcad6ad9683348286fec0b6f07f0/ast_fuzzer_amd_debug/fatal.log
Reproducer:
Root cause: under
group_by_use_nullsthe outerGROUP BYkey becomesNullable, and correlated-subquery decorrelation feeds thatNullable(UInt64)value into the subquery'sminusnodes, which were resolved (and keep their result type) for the non-NullableUInt64key. The resultingActionsDAGis internally inconsistent: a function node whose child type no longer matches the type itsfunction_basewas resolved for. The interpreter tolerates this and returns the correct result, but the expression JIT bakes the declared node types and emits an LLVM binary operator with mismatched operand types, hittingllvm::BinaryOperator::Create's assertion in debug builds.Fix: in
isCompilableFunction, skip JIT for a function node when a child's actualresult_typediffers from the function's resolvedgetArgumentTypes()[i]. Such a node cannot be faithfully JIT-compiled; it falls back to the interpreter, which is correct. JIT for well-formed expressions is unaffected. Added a regression test (04365_jit_correlated_subquery_group_by_use_nulls).Note: the underlying decorrelation type inconsistency has a second, JIT-independent symptom (e.g.
SELECT (SELECT toUInt32(number) - toFloat64(number - 1)) ... GROUP BY number WITH CUBE SETTINGS group_by_use_nulls = 1, allow_experimental_correlated_subqueries = 1raisesLOGICAL_ERROR: Unexpected return type from toUInt32. Expected UInt32. Got Nullable(UInt32)withcompile_expressions = 0too). That is a separate planner/analyzer issue in the experimental correlated-subqueries feature and is out of scope for this JIT-crash fix.