#12962 removed the lookup table for unpack bits causing a factor 8 performance regression:
before after ratio
[08b17aee] [e6227a03]
<v1.16.3^0> <master>
+ 5.80±0.5μs 46.4±1μs 8.00 bench_core.UnpackBits.time_unpackbits
+ 120±2μs 911±20μs 7.58 bench_core.UnpackBits.time_unpackbits_axis1
While the function is probably not that critical for performance the change is of rather large magnitude and should probably be added to the release notes.
As for fixing to simplify the code one could just bitswap the data, well performing functions for that could be generally useful and might even be worth exposing (e.g. blosc has code for that).
#12962 removed the lookup table for unpack bits causing a factor 8 performance regression:
While the function is probably not that critical for performance the change is of rather large magnitude and should probably be added to the release notes.
As for fixing to simplify the code one could just bitswap the data, well performing functions for that could be generally useful and might even be worth exposing (e.g. blosc has code for that).