Use Store.get_many for whole-chunk reads in BatchedCodecPipeline by TomNicholas · Pull Request #4113 · zarr-developers/zarr-python · GitHub
Skip to content

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113

Draft
TomNicholas wants to merge 2 commits into
zarr-developers:mainfrom
TomNicholas:feat/pipeline-use-get-many
Draft

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113
TomNicholas wants to merge 2 commits into
zarr-developers:mainfrom
TomNicholas:feat/pipeline-use-get-many

Conversation

@TomNicholas

Copy link
Copy Markdown
Member

Builds on #4112. BatchedCodecPipeline.read now fetches a whole (non-sharded) request with a single Store.get_many call instead of one get per chunk, so a store can batch/coalesce the underlying reads — independently of codec_pipeline.batch_size, which still governs only decode batching.

The sharding codec's partial-decode path is unchanged, and stores without a specialized get_many fall back to the previous concurrent per-chunk behavior.

Motivation — xref #1758 (request coalescing), #1806 (batched Store API), and zarr-developers/VirtualiZarr#947 (files-as-shards / consolidating small reads).

Stacked on #4112 — its commit is the first one here; review after it. Draft.

Add a public, overridable `Store.get_many` that retrieves many values at
once - each request being a whole key or a `(key, byte_range)` pair. It
generalizes `Store.get_ranges` (many ranges of one key) to many keys, and
yields `(request_index, Buffer | None)` batches in completion order so a
store can coalesce reads that land in the same underlying object.

The ABC default fetches requests concurrently with `get`, so every store
works out of the box; stores with a bulk backend override it (`FsspecStore`
coalesces via fsspec's `cat_ranges`). Coalescing tuning is left to each
store rather than exposed on the interface.

This restores and generalizes the batched-fetch capability of the v2
`getitems` Store API (see zarr-developersgh-1806).
BatchedCodecPipeline.read now fetches the encoded bytes for an entire
(non-sharded) read with a single Store.get_many call, instead of one
Store.get per chunk. It drives get_many over all chunk keys, scatters the
completion-ordered (index, buffer) results back into position, and feeds
them to the per-batch decode path.

This lets a store batch or coalesce the underlying reads (e.g. FsspecStore
via cat_ranges, or a custom store such as virtualizarr's ManifestStore /
icechunk's IcechunkStore that overrides get_many) regardless of
codec_pipeline.batch_size, which still governs only decode batching. The
sharding codec's partial-decode path is untouched, and stores without a
specialized get_many fall back to the previous concurrent per-chunk gets.
@TomNicholas TomNicholas force-pushed the feat/pipeline-use-get-many branch from d8a292d to 4f1ad9f Compare July 1, 2026 21:00
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant