Add methods for getting bytes + json to store abc by d-v-b · Pull Request #3638 · zarr-developers/zarr-python · GitHub
Skip to content

Add methods for getting bytes + json to store abc#3638

Merged
d-v-b merged 19 commits into
zarr-developers:mainfrom
d-v-b:feat/get_json
Jan 26, 2026
Merged

Add methods for getting bytes + json to store abc#3638
d-v-b merged 19 commits into
zarr-developers:mainfrom
d-v-b:feat/get_json

Conversation

@d-v-b

@d-v-b d-v-b commented Jan 4, 2026

Copy link
Copy Markdown
Contributor

Adds some convenience methods to the store abc:

  • get_bytes_async
  • get_json_async
  • get_bytes
  • get_json

The async methods have a _async suffix. get_bytes and get_json are synchronous.

this PR allows the following, which is a huge QOL improvement for people working with Zarr stores.

import zarr
from zarr.storage import MemoryStore

store = MemoryStore({})
zg = zarr.create_group(store)

zmeta = store.get_json('zarr.json')
print(zmeta)
# {'attributes': {}, 'zarr_format': 3, 'node_type': 'group'}

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 4, 2026
@codecov

codecov Bot commented Jan 4, 2026

Copy link
Copy Markdown

@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 4, 2026
@paraseba

paraseba commented Jan 4, 2026

Copy link
Copy Markdown
Contributor

What is the advantage of making these helpers part of the ABC, instead of helper functions or a new concrete Store? Is the intention that certain implementations will override these? If so, what would be the advantages of overriding?

@d-v-b

d-v-b commented Jan 4, 2026

Copy link
Copy Markdown
Contributor Author

What is the advantage of making these helpers part of the ABC, instead of helper functions or a new concrete Store?

Putting this functionality on the ABC ensures that all stores implement it, without making per-store changes. Reading JSON is not actually new functionality, and ultimately I want to use these helpers internally (specifically, the async ones for reading JSON). I didn't do that in this PR because it requires changes to the StorePath class.

Is the intention that certain implementations will override these? If so, what would be the advantages of overriding?

Yes, in this PR the localstore and memorystore implementations override these to allow prototype to be None. See https://github.com/d-v-b/zarr-python/blob/d70a5e5277ca7a225d9fb0ac22945b5e1fcb8cc3/src/zarr/storage/_memory.py#L178-L184, and I'm following the pre-existing convention on those stores with these overrides.

The advantage in this case is to work around the cumbersome requirement that store methods take a prototype parameter, which IMO we should move away from, but that's something for another PR

Comment thread src/zarr/abc/store.py
...

async def get_bytes_async(
self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the output of this method is a bytes object (CPU memory), I don't see the value of specifying a prototype in this method. can anyone suggest a situation where specifying the prototype would be useful here?

cc @TomAugspurger

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you, that a prototype probably isn't needed. But I'd suggest including it for consistency with the other methods.

@TomAugspurger TomAugspurger left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since methods like get are asynchronous, I'd recommend making these convenience methods async-first and add a _sync suffix for the sync versions. get_bytes / get_bytes_async, etc.

Comment thread src/zarr/abc/store.py
...

async def get_bytes_async(
self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you, that a prototype probably isn't needed. But I'd suggest including it for consistency with the other methods.

Comment thread src/zarr/abc/store.py Outdated
Examples
--------
>>> store = MemoryStore()
>>> store.set("data", Buffer.from_bytes(b"hello world"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs an await.

Comment thread src/zarr/abc/store.py
----------
key : str, optional
The key identifying the data to retrieve. Defaults to an empty string.
prototype : BufferPrototype

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to align this with what we do in the async version.

>>> print(data)
b'hello'
"""
if prototype is None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be done in the base class, and then all the child classes reuse the same implementation?

@d-v-b d-v-b Jan 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this direction. It would definitely be more convenient, but coupling the abstract store definition to the behavior of the default_buffter_prototype(), which in turn depends on the global config, seems much worse than other options (namely, baking buffer type into the store definition, or into store creation)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #3643

@d-v-b

d-v-b commented Jan 5, 2026

Copy link
Copy Markdown
Contributor Author

Since methods like get are asynchronous, I'd recommend making these convenience methods async-first and add a _sync suffix for the sync versions. get_bytes / get_bytes_async, etc.

@jhamman had the same preference and I haven't heard anyone strongly support the current state of this PR, so I'll make this change

@d-v-b

d-v-b commented Jan 8, 2026

Copy link
Copy Markdown
Contributor Author

get_json and get_bytes are now async, and get_json_sync and get_bytes_sync are now synchronous.

@d-v-b d-v-b mentioned this pull request Jan 8, 2026
@d-v-b d-v-b requested a review from TomAugspurger January 9, 2026 09:16
@d-v-b

d-v-b commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

in the dev meeting this week @maxrjones suggested making the new methods private API, so:

store._get_json_sync
store._get_json
store._get_bytes
store._get_bytes_sync

Is that an accurate summary max?

@codspeed-hq

codspeed-hq Bot commented Jan 16, 2026

Copy link
Copy Markdown

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing d-v-b:feat/get_json (89c5acc) with main (b30ae18)

Summary

✅ 48 untouched benchmarks
⏩ 6 skipped benchmarks1

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@maxrjones

Copy link
Copy Markdown
Member

in the dev meeting this week @maxrjones suggested making the new methods private API, so:

store._get_json_sync
store._get_json
store._get_bytes
store._get_bytes_sync

Is that an accurate summary max?

this was my off-hand reaction, but I am glad to defer to those with better API design instincts

@d-v-b

d-v-b commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

If it makes this PR more palatable, I'm happy making stuff private for now!

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 20, 2026
@d-v-b

d-v-b commented Jan 20, 2026

Copy link
Copy Markdown
Contributor Author

the new methods are now all private. Since "read JSON" is indisputably a core behavior of a Zarr store, I think we should make these public sooner rather than later.

@d-v-b

d-v-b commented Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

As the methods are private, I removed the release notes, so I think this is ready for final review

@maxrjones maxrjones left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prototype argument is awkward IMO, but seems worthwhile to merge this and try it out more to see whether that awkwardness matters.

@maxrjones maxrjones left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prototype argument is awkward IMO, but seems worthwhile to merge this and try it out more to see whether that awkwardness matters.

@d-v-b

d-v-b commented Jan 26, 2026

Copy link
Copy Markdown
Contributor Author

@d-v-b d-v-b merged commit 0e61449 into zarr-developers:main Jan 26, 2026
26 checks passed
@d-v-b d-v-b deleted the feat/get_json branch January 26, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants