iframe-proxy

peter-toth · 2026-04-23T18:44:25Z

What changes were proposed in this pull request?

When a KeyedPartitioning passes through a PartitioningPreservingUnaryExecNode (e.g. ProjectExec), the previous implementation projected the partitioning as a whole expression via multiTransformDown. If any expression position could not be mapped to an output attribute, the entire KeyedPartitioning was silently dropped, resulting in UnknownPartitioning.

This PR replaces that approach with a per-position projection algorithm implemented in two new private helpers (projectKeyedPartitionings and projectOtherPartitionings), with the main outputPartitioning reduced to a simple split, project, and combine:

For each expression position (0..N-1), collect the unique expressions at that position across all input KeyedPartitionings (using ExpressionSet to deduplicate semantically equal expressions), then project each through the output aliases via projectExpression.
Positions with at least one projected alternative are projectable; they define the maximum achievable granularity. Positions that cannot be expressed in the output are dropped (narrowing).
The shared partitionKeys are projected to the subset of projectable positions via KeyedPartitioning.projectKeys.
The final KeyedPartitionings are the cross-product of per-position alternatives, computed lazily via MultiTransform.generateCartesianProduct, deduplicated, and bounded by a single outer take(aliasCandidateLimit).

All resulting KeyedPartitionings at the same granularity share the same partitionKeys object, preserving the invariant required by GroupPartitionsExec.

Why are the changes needed?

Without narrowing, a ProjectExec that drops any one of a multi-column partition key causes the entire KeyedPartitioning to be lost. This breaks storage-partitioned join optimisations that rely on the partitioning surviving projection.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests in ProjectedOrderingAndPartitioningSuite covering:

Full-granularity alias substitution (existing behaviour, unchanged)
2->1 narrowing without aliases
2->1 narrowing with alias, verifying shared partitionKeys object identity
3->2 narrowing with alias
PartitioningCollection where one KP can be fully projected and another cannot

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

…in PartitioningPreservingUnaryExecNode ### What changes were proposed in this pull request? When a `KeyedPartitioning` passes through a `PartitioningPreservingUnaryExecNode` (e.g. `ProjectExec`), the previous implementation projected the partitioning as a whole expression via `multiTransformDown`. If any expression position could not be mapped to an output attribute, the entire `KeyedPartitioning` was silently dropped, resulting in `UnknownPartitioning`. This PR replaces that approach with a per-position projection algorithm implemented in two new private helpers (`projectKeyedPartitionings` and `projectOtherPartitionings`), with the main `outputPartitioning` reduced to a simple split, project, and combine: 1. For each expression position (0..N-1), collect the unique expressions at that position across all input `KeyedPartitioning`s (using `ExpressionSet` to deduplicate semantically equal expressions), then project each through the output aliases via `projectExpression`. 2. Positions with at least one projected alternative are *projectable*; they define the maximum achievable granularity. Positions that cannot be expressed in the output are dropped (narrowing). 3. The shared `partitionKeys` are projected to the subset of projectable positions via `KeyedPartitioning.projectKeys`. 4. The final `KeyedPartitioning`s are the cross-product of per-position alternatives, computed lazily via `MultiTransform.generateCartesianProduct`, deduplicated, and bounded by a single outer `take(aliasCandidateLimit)`. All resulting `KeyedPartitioning`s at the same granularity share the same `partitionKeys` object, preserving the invariant required by `GroupPartitionsExec`. ### Why are the changes needed? Without narrowing, a `ProjectExec` that drops any one of a multi-column partition key causes the entire `KeyedPartitioning` to be lost. This breaks storage-partitioned join optimisations that rely on the partitioning surviving projection. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests in `ProjectedOrderingAndPartitioningSuite` covering: - Full-granularity alias substitution (existing behaviour, unchanged) - 2->1 narrowing without aliases - 2->1 narrowing with alias, verifying shared `partitionKeys` object identity - 3->2 narrowing with alias - `PartitioningCollection` where one KP can be fully projected and another cannot ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6

peter-toth force-pushed the SPARK-46367-keyedpartitioning-projection branch from 433d560 to 0b3e7bc Compare April 23, 2026 18:49

peter-toth mentioned this pull request Apr 23, 2026

[SPARK-46367][SQL] Fix KeyedPartitioning not remapped through column aliases in ProjectExec #55475

Open

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46367][SQL] Support narrowing projection of `KeyedPartitioning` in `PartitioningPreservingUnaryExecNode`#55519

[SPARK-46367][SQL] Support narrowing projection of `KeyedPartitioning` in `PartitioningPreservingUnaryExecNode`#55519
peter-toth wants to merge 1 commit intoapache:masterfrom
peter-toth:SPARK-46367-keyedpartitioning-projection

peter-toth commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

peter-toth commented Apr 23, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant