[SPARK-56032][SQL][FOLLOWUP] Defer throw-capable notNullPreds past guard otherPreds in FilterExec CSE codegen by cloud-fan · Pull Request #55506 · apache/spark · GitHub
Skip to content

[SPARK-56032][SQL][FOLLOWUP] Defer throw-capable notNullPreds past guard otherPreds in FilterExec CSE codegen#55506

Open
cloud-fan wants to merge 9 commits intoapache:masterfrom
cloud-fan:cloud-fan/SPARK-56032-repro
Open

[SPARK-56032][SQL][FOLLOWUP] Defer throw-capable notNullPreds past guard otherPreds in FilterExec CSE codegen#55506
cloud-fan wants to merge 9 commits intoapache:masterfrom
cloud-fan:cloud-fan/SPARK-56032-repro

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Apr 23, 2026

What changes were proposed in this pull request?

Follow-up to #54862 and #55471.

In FilterExec.doConsume's CSE branch, emit each IsNotNull(ref) from notNullPreds just before the first otherPred that references ref, and defer any remaining notNullPreds to after all otherPreds. This mirrors the non-CSE path's ordering. Previously (since #54862, refined in #55471) the CSE branch emitted the full block of notNullPreds up front, before any otherPred.

Why are the changes needed?

InferFiltersFromConstraints can add an IsNotNull(expr) whose inner expression can throw on valid non-null inputs -- e.g. IsNotNull(cast(s as int)) derived from a downstream join that expects the cast to be non-null. That IsNotNull lands in notNullPreds. The CSE branch was emitting it up front, so on any row where an earlier-ordered guard otherPred (e.g. kind = 'numeric') would have short-circuited, the upfront IsNotNull still evaluated cast(s as int) and threw CAST_INVALID_INPUT. The non-CSE branch interleaves by-reference and defers leftovers, so the same query succeeds there.

Repro (added as a new test): a CTE chain where an inner guard filter bounds the input to a type that the outer projection then casts; CombineFilters + PushPredicateThroughProject + InferFiltersFromConstraints fold everything into a single

Filter(isnotnull(kind) AND kind = 'numeric' AND isnotnull(cast(s as int)))

With CSE enabled the cast throws on non-numeric rows; with CSE disabled it succeeds on the same physical plan. Same class of bug as (a) in #55471 -- short-circuit preservation -- but across the notNullPreds / otherPreds boundary rather than between adjacent otherPreds.

Does this PR introduce any user-facing change?

No user-facing API change. Fixes a regression introduced in #54862 where FilterExec CSE codegen could throw on rows an earlier-ordered guard predicate would have rejected.

How was this patch tested?

Added `SPARK-56032: FilterExec CSE emits all notNullPreds before guard otherPred` to `WholeStageCodegenSuite`. Without the fix it throws `SparkNumberFormatException`; with the fix it matches the non-CSE result. All existing `WholeStageCodegenSuite` and `DataFrameSuite` tests continue to pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…ard otherPreds in FilterExec CSE codegen

### What changes were proposed in this pull request?

In `FilterExec.doConsume`'s CSE branch, emit each `IsNotNull(ref)` from
`notNullPreds` just before the first `otherPred` that references `ref`,
and defer any remaining `notNullPreds` to after all `otherPreds`. This
mirrors the non-CSE path's ordering. Previously the CSE branch emitted
the full block of `notNullPreds` up front, before any `otherPred`.

### Why are the changes needed?

`InferFiltersFromConstraints` can add an `IsNotNull(expr)` whose inner
expression can throw on valid non-null inputs -- e.g. `IsNotNull(cast(s
as int))` derived from a downstream join. That `IsNotNull` lands in
`notNullPreds`. The CSE branch was emitting it up front, so on any row
where an earlier-ordered guard `otherPred` (e.g. `kind = 'numeric'`)
would have short-circuited, the upfront `IsNotNull` still evaluated
`cast(s as int)` and threw `CAST_INVALID_INPUT`. The non-CSE branch
interleaves by-reference and defers the leftovers, so the same query
succeeds there.

Repro (new test): a CTE chain where an inner guard filter bounds the
input to a type that the outer projection then casts; `CombineFilters`
+ `PushPredicateThroughProject` + `InferFiltersFromConstraints` fold
everything into a single `Filter(isnotnull(kind) AND kind='numeric' AND
isnotnull(cast(s as int)))`. With CSE enabled the cast throws on
non-numeric rows; with CSE disabled it succeeds. Same class of bug as
(a) -- short-circuit preservation -- but across the
`notNullPreds`/`otherPreds` boundary instead of between adjacent
`otherPreds`.

### Does this PR introduce _any_ user-facing change?

No user-facing API change. Fixes a regression where `FilterExec` CSE
codegen could throw on rows an earlier-ordered guard predicate would
have rejected.

### How was this patch tested?

Added `SPARK-56032: FilterExec CSE emits all notNullPreds before guard
otherPred` to `WholeStageCodegenSuite`. Without the fix it throws
`SparkNumberFormatException`; with the fix it matches the non-CSE
result.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
Comment on lines +1034 to +1048
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider reducing the comments here, e.g. the query example here and in tests are basically identical?

- Fix missing verb in "The fix for (b)" comment block.
- Reframe the shared-ref test around invariant (b) and add a plan-shape
  assertion requiring both cast(s as int) conjuncts to survive, so
  optimizer drift cannot silently bypass the intended shape.

Co-authored-by: Isaac
…se CSE invariant comment

- WholeStageCodegenSuite: tighten Test 1 optimizer-drift guard to require
  both IsNotNull(Cast(...)) and a non-IsNotNull guard conjunct in the
  *same* FilterExec condition; otherwise a future optimizer split into
  two FilterExecs would pass the old assertion without the bug shape.
  Adds local conjuncts() helper rather than pulling in PredicateHelper.
- WholeStageCodegenSuite: add one-line rationale for
  ADAPTIVE_EXECUTION_ENABLED=false in both new tests (plan-shape
  visibility), so the intent isn't lost.
- basicPhysicalOperators.scala: condense the three-invariant preamble --
  (a) and (c) now lean on prior PRs (apache#55471) and only state the new
  fact; (b) keeps the expanded prose since it is the point of this PR.
…commentary

Align generatePredicateCode's notNullPreds lookup with the CSE branch by
using a total pattern match instead of an unchecked asInstanceOf[IsNotNull],
so the two paths stay structurally in lockstep if notNullPreds is ever
widened beyond IsNotNull. Behavior is unchanged today.

Also condense redundant commentary across the two SPARK-56032 tests:
pull the AQE-off rationale up once and cross-reference it from the
sibling test, and tighten the optimizer-drift guard note -- addressing
the reviewer's feedback on redundant commentary.

Co-authored-by: Isaac
@cloud-fan
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants