{{ message }}
Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113
Draft
TomNicholas wants to merge 2 commits into
Draft
Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113TomNicholas wants to merge 2 commits into
TomNicholas wants to merge 2 commits into
Conversation
Add a public, overridable `Store.get_many` that retrieves many values at once - each request being a whole key or a `(key, byte_range)` pair. It generalizes `Store.get_ranges` (many ranges of one key) to many keys, and yields `(request_index, Buffer | None)` batches in completion order so a store can coalesce reads that land in the same underlying object. The ABC default fetches requests concurrently with `get`, so every store works out of the box; stores with a bulk backend override it (`FsspecStore` coalesces via fsspec's `cat_ranges`). Coalescing tuning is left to each store rather than exposed on the interface. This restores and generalizes the batched-fetch capability of the v2 `getitems` Store API (see zarr-developersgh-1806).
BatchedCodecPipeline.read now fetches the encoded bytes for an entire (non-sharded) read with a single Store.get_many call, instead of one Store.get per chunk. It drives get_many over all chunk keys, scatters the completion-ordered (index, buffer) results back into position, and feeds them to the per-batch decode path. This lets a store batch or coalesce the underlying reads (e.g. FsspecStore via cat_ranges, or a custom store such as virtualizarr's ManifestStore / icechunk's IcechunkStore that overrides get_many) regardless of codec_pipeline.batch_size, which still governs only decode batching. The sharding codec's partial-decode path is untouched, and stores without a specialized get_many fall back to the previous concurrent per-chunk gets.
d8a292d to
4f1ad9f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Builds on #4112.
BatchedCodecPipeline.readnow fetches a whole (non-sharded) request with a singleStore.get_manycall instead of onegetper chunk, so a store can batch/coalesce the underlying reads — independently ofcodec_pipeline.batch_size, which still governs only decode batching.The sharding codec's partial-decode path is unchanged, and stores without a specialized
get_manyfall back to the previous concurrent per-chunk behavior.Motivation — xref #1758 (request coalescing), #1806 (batched Store API), and zarr-developers/VirtualiZarr#947 (files-as-shards / consolidating small reads).
Stacked on #4112 — its commit is the first one here; review after it. Draft.