iframe-proxy

cloud-fan · 2026-04-23T09:10:52Z

What changes were proposed in this pull request?

Follow-up to #54862 and #55471.

In FilterExec.doConsume's CSE branch, emit each IsNotNull(ref) from notNullPreds just before the first otherPred that references ref, and defer any remaining notNullPreds to after all otherPreds. This mirrors the non-CSE path's ordering. Previously (since #54862, refined in #55471) the CSE branch emitted the full block of notNullPreds up front, before any otherPred.

Why are the changes needed?

InferFiltersFromConstraints can add an IsNotNull(expr) whose inner expression can throw on valid non-null inputs -- e.g. IsNotNull(cast(s as int)) derived from a downstream join that expects the cast to be non-null. That IsNotNull lands in notNullPreds. The CSE branch was emitting it up front, so on any row where an earlier-ordered guard otherPred (e.g. kind = 'numeric') would have short-circuited, the upfront IsNotNull still evaluated cast(s as int) and threw CAST_INVALID_INPUT. The non-CSE branch interleaves by-reference and defers leftovers, so the same query succeeds there.

Repro (added as a new test): a CTE chain where an inner guard filter bounds the input to a type that the outer projection then casts; CombineFilters + PushPredicateThroughProject + InferFiltersFromConstraints fold everything into a single

Filter(isnotnull(kind) AND kind = 'numeric' AND isnotnull(cast(s as int)))

With CSE enabled the cast throws on non-numeric rows; with CSE disabled it succeeds on the same physical plan. Same class of bug as (a) in #55471 -- short-circuit preservation -- but across the notNullPreds / otherPreds boundary rather than between adjacent otherPreds.

Does this PR introduce any user-facing change?

No user-facing API change. Fixes a regression introduced in #54862 where FilterExec CSE codegen could throw on rows an earlier-ordered guard predicate would have rejected.

How was this patch tested?

Added `SPARK-56032: FilterExec CSE emits all notNullPreds before guard otherPred` to `WholeStageCodegenSuite`. Without the fix it throws `SparkNumberFormatException`; with the fix it matches the non-CSE result. All existing `WholeStageCodegenSuite` and `DataFrameSuite` tests continue to pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…ard otherPreds in FilterExec CSE codegen ### What changes were proposed in this pull request? In `FilterExec.doConsume`'s CSE branch, emit each `IsNotNull(ref)` from `notNullPreds` just before the first `otherPred` that references `ref`, and defer any remaining `notNullPreds` to after all `otherPreds`. This mirrors the non-CSE path's ordering. Previously the CSE branch emitted the full block of `notNullPreds` up front, before any `otherPred`. ### Why are the changes needed? `InferFiltersFromConstraints` can add an `IsNotNull(expr)` whose inner expression can throw on valid non-null inputs -- e.g. `IsNotNull(cast(s as int))` derived from a downstream join. That `IsNotNull` lands in `notNullPreds`. The CSE branch was emitting it up front, so on any row where an earlier-ordered guard `otherPred` (e.g. `kind = 'numeric'`) would have short-circuited, the upfront `IsNotNull` still evaluated `cast(s as int)` and threw `CAST_INVALID_INPUT`. The non-CSE branch interleaves by-reference and defers the leftovers, so the same query succeeds there. Repro (new test): a CTE chain where an inner guard filter bounds the input to a type that the outer projection then casts; `CombineFilters` + `PushPredicateThroughProject` + `InferFiltersFromConstraints` fold everything into a single `Filter(isnotnull(kind) AND kind='numeric' AND isnotnull(cast(s as int)))`. With CSE enabled the cast throws on non-numeric rows; with CSE disabled it succeeds. Same class of bug as (a) -- short-circuit preservation -- but across the `notNullPreds`/`otherPreds` boundary instead of between adjacent `otherPreds`. ### Does this PR introduce _any_ user-facing change? No user-facing API change. Fixes a regression where `FilterExec` CSE codegen could throw on rows an earlier-ordered guard predicate would have rejected. ### How was this patch tested? Added `SPARK-56032: FilterExec CSE emits all notNullPreds before guard otherPred` to `WholeStageCodegenSuite`. Without the fix it throws `SparkNumberFormatException`; with the fix it matches the non-CSE result. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7

…an shape in test

yaooqinn · 2026-04-23T10:45:04Z

+    // The CSE branch in FilterExec.doConsume emits every notNullPred upfront,
+    // before any otherPred. When `InferFiltersFromConstraints` adds a notNull
+    // check on a throw-capable expression (`IsNotNull(cast(s as int))`) derived
+    // from a downstream join, that upfront evaluation fires on rows that the
+    // guard otherPred (`kind = 'numeric'`) would have rejected.
+    //
+    // Shape:
+    //   WITH guarded AS (SELECT s FROM t WHERE kind = 'numeric'),
+    //        derived AS (SELECT s, CAST(s AS INT) AS n FROM guarded),
+    //        pairs   AS (SELECT s, i FROM derived, idx WHERE i <= n)
+    //   SELECT s, i FROM pairs
+    // Optimized filter:
+    //   Filter(isnotnull(kind) AND kind = 'numeric' AND isnotnull(cast(s as int)))
+    // The non-CSE path defers leftover notNullPreds to after the otherPreds,
+    // so the guard short-circuits before the cast runs.


Consider reducing the comments here, e.g. the query example here and in tests are basically identical?

Co-authored-by: Isaac

- Fix missing verb in "The fix for (b)" comment block. - Reframe the shared-ref test around invariant (b) and add a plan-shape assertion requiring both cast(s as int) conjuncts to survive, so optimizer drift cannot silently bypass the intended shape. Co-authored-by: Isaac

…comment

…se CSE invariant comment - WholeStageCodegenSuite: tighten Test 1 optimizer-drift guard to require both IsNotNull(Cast(...)) and a non-IsNotNull guard conjunct in the *same* FilterExec condition; otherwise a future optimizer split into two FilterExecs would pass the old assertion without the bug shape. Adds local conjuncts() helper rather than pulling in PredicateHelper. - WholeStageCodegenSuite: add one-line rationale for ADAPTIVE_EXECUTION_ENABLED=false in both new tests (plan-shape visibility), so the intent isn't lost. - basicPhysicalOperators.scala: condense the three-invariant preamble -- (a) and (c) now lean on prior PRs (apache#55471) and only state the new fact; (b) keeps the expanded prose since it is the point of this PR.

…ver block and defensive null gate

…commentary Align generatePredicateCode's notNullPreds lookup with the CSE branch by using a total pattern match instead of an unchecked asInstanceOf[IsNotNull], so the two paths stay structurally in lockstep if notNullPreds is ever widened beyond IsNotNull. Behavior is unchanged today. Also condense redundant commentary across the two SPARK-56032 tests: pull the AQE-off rationale up once and cross-reference it from the sibling test, and tighten the optimizer-drift guard note -- addressing the reviewer's feedback on redundant commentary. Co-authored-by: Isaac

cloud-fan · 2026-04-23T15:27:55Z

cloud-fan added 2 commits April 23, 2026 09:10

Emit synthetic IsNotNull(r) in CSE branch to match non-CSE; assert pl…

420d973

…an shape in test

yaooqinn reviewed Apr 23, 2026

View reviewed changes

cloud-fan added 7 commits April 23, 2026 11:09

Add synthetic-IsNotNull CSE test; scope temp view names

032f8a3

Co-authored-by: Isaac

Gate genNotNull null-check on bound.nullable; use withTempView; trim …

9ccaf27

…comment

Rename SPARK-56032 test to reflect deferral semantics of the fix

14784d5

Cross-reference CSE/non-CSE FilterExec predicate paths; clarify lefto…

3de6f30

…ver block and defensive null gate

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56032][SQL][FOLLOWUP] Defer throw-capable notNullPreds past guard otherPreds in FilterExec CSE codegen#55506

[SPARK-56032][SQL][FOLLOWUP] Defer throw-capable notNullPreds past guard otherPreds in FilterExec CSE codegen#55506
cloud-fan wants to merge 9 commits intoapache:masterfrom
cloud-fan:cloud-fan/SPARK-56032-repro

cloud-fan commented Apr 23, 2026 •

edited

Loading

Uh oh!

yaooqinn Apr 23, 2026

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

cloud-fan commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yaooqinn Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloud-fan commented Apr 23, 2026 •

edited

Loading