refactor: one Array class with sync+async APIs by maxrjones · Pull Request #4049 · zarr-developers/zarr-python · GitHub
Skip to content

refactor: one Array class with sync+async APIs#4049

Draft
maxrjones wants to merge 3 commits into
zarr-developers:mainfrom
maxrjones:refactor/one-array-class-keep-async
Draft

refactor: one Array class with sync+async APIs#4049
maxrjones wants to merge 3 commits into
zarr-developers:mainfrom
maxrjones:refactor/one-array-class-keep-async

Conversation

@maxrjones

Copy link
Copy Markdown
Member

Alternative to #4034 that delivers every #4027 goal except removing the public AsyncArray class.

  • Array gains a user-friendly constructor: Array(metadata, store_path, config=...) builds the wrapped AsyncArray internally; Array(async_array) still works for back-compat.
  • Array gains the full public async surface (*_async twins for every sync method), so callers no longer reach into the private _async_array.
  • Each operation has a single implementation: the async twin holds the logic and the sync method delegates via sync(). No drift between the pairs; a parity test guards their signatures.
  • No Runner protocol and no _AsyncArrayView/_rebind_state shared-state machinery: Array still wraps one AsyncArray, so async_array is the held object and mutation coherence is free.
  • getitem_async/setitem_async mirror the full basic-selection parameter surface (out=, fields=, prototype=).

Tests: integrates the async/sync boundary suite (shared-state coupling, codec-pipeline-built-once, selection parity, out= zero-copy, full parameter surface). The loop-affinity test is xfail (strict) as a design-independent limitation of mixing zarr's background loop with a user-owned loop on stores with event-loop affinity.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Alternative to zarr-developers#4034 that delivers every zarr-developers#4027 goal except removing the
public AsyncArray class.

- Array gains a user-friendly constructor: Array(metadata, store_path,
  config=...) builds the wrapped AsyncArray internally; Array(async_array)
  still works for back-compat.
- Array gains the full public async surface (*_async twins for every sync
  method), so callers no longer reach into the private _async_array.
- Each operation has a single implementation: the async twin holds the
  logic and the sync method delegates via sync(). No drift between the
  pairs; a parity test guards their signatures.
- No Runner protocol and no _AsyncArrayView/_rebind_state shared-state
  machinery: Array still wraps one AsyncArray, so async_array is the held
  object and mutation coherence is free.
- getitem_async/setitem_async mirror the full basic-selection parameter
  surface (out=, fields=, prototype=).

Tests: integrates the async/sync boundary suite (shared-state coupling,
codec-pipeline-built-once, selection parity, out= zero-copy, full
parameter surface). The loop-affinity test is xfail (strict) as a
design-independent limitation of mixing zarr's background loop with a
user-owned loop on stores with event-loop affinity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 7, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jun 8, 2026

@mkitti mkitti left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have the pattern backwards here. If the objective is to lower overhead for the synchronous API,

  • Instead of calling sync on the asynchronous API ...
  • Use asyncio.to_thread to call a synchronous function.

This way the synchronous API has as little overhead as possible. Meanwhile, we still have an asynchronous API that is still asynchronous.

Comment thread src/zarr/core/array.py
6
"""
return sync(self.async_array.nchunks_initialized())
return sync(self.nchunks_initialized_async())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the "synchronous" calls just call sync on their asynchronous versions, then we still incur all the overhead of scheduling an async task.

Comment thread src/zarr/core/array.py
Comment on lines +3781 to +3784
delete_outside_chunks : bool, optional
If True (default), chunks that fall entirely outside the new array
shape are deleted from the underlying store. If False, those chunks
are left in place.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be exposing new keyword arguments in the synchronous API at this time? Could we defer this change to a later pull request?

@maxrjones maxrjones marked this pull request as draft June 8, 2026 20:31
@maxrjones

Copy link
Copy Markdown
Member Author

thanks for your review @mkitti! I should've marked this PR as a draft. It's one of a few possible solutions related to #4027, including #4034 and generated sync code. I'd be curious for your overall take on the different options.

@mkitti

mkitti commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants