Increase speed of Delta encoding using np.subtract by ehgus · Pull Request #584 · zarr-developers/numcodecs · GitHub
Skip to content

Increase speed of Delta encoding using np.subtract#584

Merged
dstansby merged 4 commits into
zarr-developers:mainfrom
ehgus:main
Oct 10, 2024
Merged

Increase speed of Delta encoding using np.subtract#584
dstansby merged 4 commits into
zarr-developers:mainfrom
ehgus:main

Conversation

@ehgus

@ehgus ehgus commented Sep 27, 2024

Copy link
Copy Markdown
Contributor

This PR accelerated Delta encoding by replacing np.diff into np.subtract.
Using inplace operation of np.subtract reduce array creation time.

Here is a simple benchmark:

from numcodecs import Delta
from numcodecs.compat import ensure_ndarray
import numpy as np
import time

# new codec
class CustomDelta(Delta):
    def encode(self, buf):
        arr = ensure_ndarray(buf).view(self.dtype)
        arr = arr.reshape(-1, order='A')
        enc = np.empty_like(arr, dtype=self.astype)
        enc[0] = arr[0]

        # only difference from Delta!!
        np.subtract(arr[1:], arr[0:-1], out = enc[1:])

        return enc

data_length = int(1e9)
x = np.random.randint(0,10000,(data_length,),dtype='i4')

# benchmark
base_codec = Delta(dtype='i4')
new_codec = CustomDelta(dtype='i4')
for codec in (base_codec, new_codec):
    time_start = time.perf_counter()
    for _ in range(10):
        codec.encode(x)
    time_end = time.perf_counter()
    benchmark_time = (time_end-time_start)/10
    print(f'time of {str(codec)} = {benchmark_time:.6f}')
# time of Delta(dtype='<i4') = 1.510299
# time of CustomDelta(dtype='<i4') = 0.953429

assert np.all(base_codec.encode(x) == new_codec.encode(x)), 'They should be equal'

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

@codecov

codecov Bot commented Sep 29, 2024

Copy link
Copy Markdown

@dstansby

dstansby commented Oct 9, 2024

Copy link
Copy Markdown
Contributor

Looks good to me 👍 - could you just leave a comment above the line you changed explaining that you're using subtract for speed, otherwise someone might come along later and switch it back 🙈

@ehgus

ehgus commented Oct 10, 2024

Copy link
Copy Markdown
Contributor Author

Sure! I will leave a comment.

@dstansby dstansby enabled auto-merge (squash) October 10, 2024 07:49
@dstansby dstansby merged commit 9518e0b into zarr-developers:main Oct 10, 2024
@jakirkham

Copy link
Copy Markdown
Member

Thanks Lee and David! 🙏

This is a good improvement. Always good to avoid unneeded array creation and copies 🙂

Noticed that bool needs a bit of extra handling. This wasn't tested previously. Idk to what extent it is used (maybe Delta + PackBits is effective in some cases?). Still it seems like a good idea to keep that case working given it is low effort. Made a small tweak to this effect with a test in PR: #595

@martindurant martindurant mentioned this pull request Oct 17, 2024
7 tasks
@bnavigator

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants