branch-4.0: [fix](nereids) Backport unique-function filter push-down guards (#62705) by yujun777 · Pull Request #62750 · apache/doris · GitHub
Skip to content

branch-4.0: [fix](nereids) Backport unique-function filter push-down guards (#62705)#62750

Open
yujun777 wants to merge 2 commits intoapache:branch-4.0from
yujun777:backport-pr-62705-branch-4.0
Open

branch-4.0: [fix](nereids) Backport unique-function filter push-down guards (#62705)#62750
yujun777 wants to merge 2 commits intoapache:branch-4.0from
yujun777:backport-pr-62705-branch-4.0

Conversation

@yujun777
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Backport of #62705 to branch-4.0.

Problem Summary:

  • backport the unique-function push-down guards for Generate and CTE consumer
  • adapt PushDownFilterThroughGenerateTest to use ExplodeNumbers on branch-4.0

Release note

Fix wrong results on branch-4.0 when filters containing non-idempotent functions are pushed through Generate or CTE consumer.

Check List (For Author)

… through Generate and CTE consumer (apache#62705)

Issue Number: close apache#25201, close apache#25202

Problem Summary:
Two Nereids rewrite rules moved filter conjuncts that contain
non-idempotent `UniqueFunction` calls (`rand` / `uuid` / `random_bytes`
/ `uuid_numeric`) across operators that change how many times the unique
function is evaluated, producing wrong results.

1. `PushDownFilterThroughGenerate` pushed a conjunct like `t1.id +
rand(1,100) > 5` below `LogicalGenerate`. Before the push, `rand` is
evaluated per generated row; after, it is evaluated per base row and
then the result is duplicated for every row produced by generate, so
groups of N generated rows share a single rand value instead of N
independent ones.

2. `CollectFilterAboveConsumer` registered filter conjuncts above a CTE
consumer into `cascadesContext.putConsumerIdToFilter(...)`, after which
`RewriteCteChildren.tryToConstructFilter` would OR them up and push them
into the CTE producer. For a conjunct like `rand() > 0.1`, that causes
the random filter to run on both the producer scan and each consumer
filter, and different consumers would see inconsistent rows.

Fix: in both rules, skip conjuncts whose `containsUniqueFunction()` is
true so they stay above the operator and are evaluated once per output
row.

Adjacent rules
(`PushDownFilterThroughRepeat/Window/PartitionTopN/SetOperation`,
`PullUpPredicates` and its consumers) have the same class of bug but are
out of scope for this PR and will be addressed separately.

(cherry picked from commit 55ee1fa)
Keep the PR apache#62705 backport test compatible with branch-4.0 by using ExplodeNumbers instead of the newer Unnest generator.

Key changes:
- replace Unnest with ExplodeNumbers in PushDownFilterThroughGenerateTest

Unit Test:
- ./build.sh --fe --clean -j20
- ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushDownFilterThroughGenerateTest,org.apache.doris.nereids.rules.rewrite.CollectFilterAboveConsumerTest
@yujun777
Copy link
Copy Markdown
Contributor Author

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Copy Markdown
Contributor Author

/review

@yujun777 yujun777 changed the title [fix](nereids) Backport unique-function filter push-down guards to branch-4.0 branch-4.0: [fix](nereids) Backport unique-function filter push-down guards (#62705) Apr 23, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary:

  • Correctness: no blocking issue found. The two changed rewrites now consistently leave UniqueFunction conjuncts above Generate / CTE-consumer boundaries, which matches the intended semantics and avoids the wrong-result cases described in the PR.
  • Backport fidelity: the code changes match upstream PR #62705, with the expected branch-4.0-only unit-test adaptation from Unnest to ExplodeNumbers.
  • Tests: the new FE unit tests and regression shape tests cover the target scenarios (pure unique-function predicates and mixed deterministic/unique conjuncts). I could not fully execute the FE unit-test command in this runner because thirdparty/installed/bin/protoc is missing from the environment, so runtime verification here was limited to code inspection plus the added coverage.
  • Release note / docs: the PR body includes a release note, and I do not see additional documentation changes needed for this optimizer correctness fix.
  • User focus: no additional user-provided review focus was supplied.

I did not find a critical blocking issue in this PR.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 75.00% (3/4) 🎉
Increment coverage report
Complete coverage report

@yujun777
Copy link
Copy Markdown
Contributor Author

run cloud_p0

@yujun777
Copy link
Copy Markdown
Contributor Author

run p0

@yujun777
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants