[SPARK-56907][SQL] Reduce per-value allocation in DELTA_LENGTH_BYTE_ARRAY Parquet vectorized reader by iemejia · Pull Request #55932 · apache/spark · GitHub
Skip to content

[SPARK-56907][SQL] Reduce per-value allocation in DELTA_LENGTH_BYTE_ARRAY Parquet vectorized reader#55932

Open
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-delta-length-byte-array
Open

[SPARK-56907][SQL] Reduce per-value allocation in DELTA_LENGTH_BYTE_ARRAY Parquet vectorized reader#55932
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:SPARK-delta-length-byte-array

Conversation

@iemejia

@iemejia iemejia commented May 17, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR reduces object allocation in the DELTA_LENGTH_BYTE_ARRAY vectorized Parquet reader (VectorizedDeltaLengthByteArrayReader) by applying three targeted changes:

readBinary: Replace per-value in.slice(length) (one ByteBuffer allocation per value) with a single bulk in.slice(totalDataLen) that reads the entire batch at once. Individual values are then written to the column vector via putByteArray from the shared backing array, eliminating N-1 ByteBuffer object allocations.

skipBinary: Replace the per-value skip loop (N separate in.skip() calls) with a single bulk skip by summing all value lengths upfront.

readGeoData: Remove the ByteBuffer.wrap() + ByteBufferOutputWriter indirection per value and call putByteArray directly from the converter output array.

Why are the changes needed?

The DELTA_LENGTH_BYTE_ARRAY encoding is used for binary/string columns in Parquet v2 pages. In the current vectorized reader, readBinary allocates one ByteBuffer per value via in.slice(length), and skipBinary performs a separate stream skip per value. For large batches (e.g. 1M values per page), this creates significant allocation pressure and per-call overhead.

Micro-benchmarks on VectorizedDeltaReaderBenchmark Group D show:

Benchmark Before (ms) After (ms) Speedup
readBinary, payloadLen=8 12 10 1.2x
readBinary, payloadLen=32 16 14 1.1x
readBinary, payloadLen=128 13 12 1.1x
readBinary, payloadLen=512 32 32 ~1.0x
skipBinary (all sizes) 7 5 1.4x

readBinary speedup is larger for small payloads where allocation cost dominates. skipBinary shows consistent 1.4x improvement across all payload sizes.

GHA results on AMD EPYC 7763 (vs upstream baseline on same CPU):

Case JDK 17 Baseline JDK 17 PR Speedup JDK 21 Baseline JDK 21 PR Speedup JDK 25 Baseline JDK 25 PR Speedup
readBinary, payloadLen=8 48.1 55.0 1.14x 51.7 56.6 1.09x 48.7 57.7 1.18x
readBinary, payloadLen=32 48.0 52.9 1.10x 63.9 53.9 0.84x 48.9 57.2 1.17x
skipBinary, payloadLen=8 95.8 136.1 1.42x 106.3 179.4 1.69x 97.0 134.4 1.39x
skipBinary, payloadLen=32 97.6 136.8 1.40x 175.9 179.0 1.02x 97.0 129.5 1.34x

Full committed results: JDK 17, JDK 21, JDK 25

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests: ParquetDeltaLengthByteArrayEncodingSuite (14 tests including serialization, random strings, empty strings, skip interleaving, and geo types) and ParquetEncodingSuite all pass.

Benchmarks: VectorizedDeltaReaderBenchmark Group D (DELTA_LENGTH_BYTE_ARRAY) run locally on JDK 17.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenCode with Claude claude-opus-4.6

…RRAY Parquet vectorized reader

This PR reduces object allocation in the DELTA_LENGTH_BYTE_ARRAY vectorized Parquet reader (`VectorizedDeltaLengthByteArrayReader`) by applying three targeted changes:

**readBinary**: Replace per-value `in.slice(length)` (one ByteBuffer allocation per value) with a single bulk `in.slice(totalDataLen)` that reads the entire batch at once. Individual values are then written to the column vector via `putByteArray` from the shared backing array, eliminating N-1 ByteBuffer object allocations.

**skipBinary**: Replace the per-value skip loop (N separate `in.skip()` calls) with a single bulk skip by summing all value lengths upfront.

**readGeoData**: Remove the `ByteBuffer.wrap()` + `ByteBufferOutputWriter` indirection per value and call `putByteArray` directly from the converter output array.
@iemejia iemejia force-pushed the SPARK-delta-length-byte-array branch from 1448f4e to 3c0d2aa Compare June 23, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant