perf(blake2s): reduce allocations and CPU in CASM hash wrapper by ongyimeng · Pull Request #3787 · NethermindEth/juno · GitHub
Skip to content

perf(blake2s): reduce allocations and CPU in CASM hash wrapper#3787

Merged
ongyimeng merged 3 commits into
mainfrom
perf/blake2s-reduce-allocs
Jul 3, 2026
Merged

perf(blake2s): reduce allocations and CPU in CASM hash wrapper#3787
ongyimeng merged 3 commits into
mainfrom
perf/blake2s-reduce-allocs

Conversation

@ongyimeng

@ongyimeng ongyimeng commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

This reduces allocations and CPU in Juno's wrapper around Blake2s, which is used when locally recomputing CASM HashVersionV2 hashes. The underlying golang.org/x/crypto/blake2s implementation is unchanged.

Closes #3751

Changes

  • Encode felts directly into the final byte buffer, removing the intermediate []uint32 allocation and copy.
  • Call Bits() once per felt and compare limbs directly against the 2^63 threshold, avoiding repeated Montgomery conversions from Cmp.
  • Use the one-shot blake2s.Sum256 API instead of New256 / Write / Sum for one-shot hashing.
  • Add BenchmarkCasmClassHashV2 for the CASM V2 recomputation path.

Results

Measured with benchstat (n=6) on a real mainnet class-heavy fixture: block 10727585, ~25k bytecode felts.

Metric Before After Delta
Time 2.673ms ± 17% 1.412ms ± 5% -47.18%
Memory 852.5 KiB ± 0% 561.7 KiB ± 0% -34.11%
Allocs 2602 ± 0% 1726 ± 0% -33.67%

The time baseline is noisier than memory and allocations, but the direction and allocation reduction are stable.

Testing

  • Existing hash vectors in hash_test.go cover the encoding boundaries (zero, 2^63, max uint64, mixed small/large values).
  • Existing class_test.go coverage verifies HashVersionV2 output against an expected compiled class hash.
  • utils_test.go is removed because it only tested private helpers that no longer exist, behavior is still covered by the public hash vectors.

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.29%. Comparing base (7dbcfd9) to head (2a353f5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3787      +/-   ##
==========================================
- Coverage   75.34%   75.29%   -0.05%     
==========================================
  Files         438      438              
  Lines       39528    39536       +8     
==========================================
- Hits        29781    29770      -11     
- Misses       7676     7695      +19     
  Partials     2071     2071              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ongyimeng ongyimeng marked this pull request as ready for review July 3, 2026 03:35
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ongyimeng's task in 2m 2s —— View job


Review: PR #3787 — blake2s allocation/CPU reduction

  • Read changed files and diff vs origin/main
  • Verify encoding correctness (felt limbs, 2^63 threshold)
  • Verify Sum256 equivalence to New256/Write/Sum
  • Check the removed helper / test coverage
  • Post review

Nicely scoped performance change. The refactor is correct — I verified each transformation against the original behavior:

  • 2^63 threshold rewrite is exactly equivalent. Felt.Bits()fp.Element.Bits() returns the non-Montgomery limbs with fb[0] as the least-significant word. So fb[3]==0 && fb[2]==0 && fb[1]==0 && fb[0] < 2^63 is precisely f < 2^63, matching the old f.Cmp(&smallFeltThreshold) < 0. Bonus: it avoids the two extra Bits() conversions that Cmp would trigger.
  • Byte encoding is byte-identical. Small path ([hi32, lo32] of fb[0]), large path (8 words MSW→LSW with largeFeltMarker OR'd into the first word), and the per-word little-endian byte order (appendUint32LE) all reproduce the previous encodeFeltsToUint32s+encodeUint32sToBytes output. Capacity is preserved (len*5*4 bytes ≡ len*5 u32s).
  • Sum256 is equivalent to the unkeyed New256(nil)/Write/Sum path, and reversing the fixed [32]byte (BE→LE) matches the old slices.Reverse on the sum slice.
  • The removed private symbols (encodeFeltsToUint32s, encodeUint32sToBytes, smallFeltThreshold) have no remaining references.
  • Public vectors in hash_test.go exercise both encoding paths and the boundary end-to-end (1<<63 - 1, exactly 1<<63, large felt, mixed, max-u64), so deleting the private-helper tests doesn't lose meaningful coverage.

I could not run go test in this environment (command approval), but CI/codecov reports green with full coverage on the modified lines.

nit (test coverage)

The PR removes utils_test.go stating it "only tested private helpers that no longer exist." That holds for TestFeltToUint32Encoding and TestUint32ToBytes, but the file also contained TestUnsafeConversion, which was not a private-helper test — it guarded the memory-layout invariant ([4]uint64felt.Felt) behind the still-live unsafe.Slice cast in hash.go:24. All remaining hash_test.go vectors call Blake2sArray with F = felt.Felt, so the cast's assumption for other FeltLike types (e.g. felt.Hash) is no longer asserted anywhere.

Consider preserving TestUnsafeConversion (moved into hash_test.go or a package-internal _test.go) so that invariant stays covered. Not a blocker.

Summary

  • Blocker: none
  • Important: none
  • Nit: 1 — TestUnsafeConversion was dropped along with the genuinely-obsolete helper tests; it still guards a live unsafe assumption.

LGTM aside from the nit.

@ongyimeng

Copy link
Copy Markdown
Contributor Author

Will add back TestUnsafeConversion based on the nit.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ongyimeng's task in 1m 48s —— View job


Re-review: PR #3787 — blake2s allocation/CPU reduction

New commit 4ff06afd8 test(blake2s): restore unsafe felt memory-layout invariant addresses the prior review nit.

  • Read new hash_internal_test.go and re-diff vs origin/main
  • Verify TestUnsafeConversion restores the memory-layout invariant it replaced
  • Sanity-check the rest of the diff still holds
  • Post review

The follow-up correctly resolves the prior nit. TestUnsafeConversion is byte-identical to the version that lived in the deleted utils_test.go, and it's now in hash_internal_test.go under package blake2s — a package-internal _test.go, exactly as suggested. It re-asserts the [4]uint64felt.Felt layout invariant that the still-live unsafe.Slice cast in hash.go:24 depends on. Both _test.go packages (blake2s and blake2s_test) coexist fine.

The rest of the change is unchanged from the previously-reviewed state and still holds (encoding is byte-identical, 2**63 limb comparison is exactly equivalent to the old Cmp, Sum256 matches New256/Write/Sum).

I couldn't run go test ./core/crypto/blake2s/ here (command approval), but codecov reports green with full coverage on modified lines.

One remaining nit

The deleted utils_test.go also contained TestFeltToBytesEncoding, which tested encodeFeltsToBytes directly at the byte level. That function still exists and is the heart of this optimization, so the PR description's rationale ("only tested private helpers that no longer exist") doesn't quite cover it. The behavior is still covered end-to-end by the hash_test.go vectors, so it's not a correctness gap — but since you're already restoring one test into hash_internal_test.go, restoring this one too would pin the exact byte layout rather than surfacing a regression as an opaque digest mismatch. Left as an inline nit. Not a blocker.

Summary

  • Blocker: none
  • Important: none
  • Nit: 1 — consider also restoring TestFeltToBytesEncoding (tests a still-live function)

Prior nit resolved. LGTM.

Comment thread core/crypto/blake2s/utils.go
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@ongyimeng ongyimeng merged commit 2c9551b into main Jul 3, 2026
30 of 31 checks passed
@ongyimeng ongyimeng deleted the perf/blake2s-reduce-allocs branch July 3, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hash: Reduce the amount of memory allocated when hashing using Blake2s

2 participants