[BWARE] Restructure CLALibBinaryCellOp and decompression fallbacks by Baunsgaard · Pull Request #2519 · apache/systemds · GitHub
Skip to content

[BWARE] Restructure CLALibBinaryCellOp and decompression fallbacks#2519

Merged
Baunsgaard merged 4 commits into
apache:mainfrom
Baunsgaard:split/binaryCellOp
Jun 28, 2026
Merged

[BWARE] Restructure CLALibBinaryCellOp and decompression fallbacks#2519
Baunsgaard merged 4 commits into
apache:mainfrom
Baunsgaard:split/binaryCellOp

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Reworks how the compressed library handles binary cell-wise operations, broadening the set of inputs it can keep compressed and tightening the "give up and decompress" path so it is consistent across operation shapes.

… ops

Reworks how the compressed library handles binary cell-wise operations,
broadening the set of inputs it can keep compressed and tightening the
"give up and decompress" path so it is consistent across operation
shapes.

- CLALibBinaryCellOp:
    - large refactor (~340 lines): split per-operation routing into
      smaller dispatch helpers, push compressed/uncompressed decisions
      to the call site, and add fallbacks for shapes the compressed
      path does not support (SDC + complex shapes, etc.)
- BinaryMatrixMatrixCPInstruction:
    - on supported LibCommonsMath ops with compressed inputs, eagerly
      decompress so the math library always sees a plain MatrixBlock
- CompressedMatrixBlock:
    - centralize getUncompressed(opcode) callers; tighten how
      binaryOperationsLeft/Right/InPlace route into the lib
- CompressedMatrixBlockFactory:
    - small adjustments to align with the new lib entry points
transposeSelfMatrixMultOperations ignored the block returned by
CLALibTSMM.leftMultByTransposeSelf and returned the passed-in out
reference instead. The helper allocates (and for empty/null inputs
returns a freshly created block) internally, so returning the stale
out yielded a 0x0 result for the LEFT case. Return the helper's
result directly.
The new sparse-sparse column-vector path in CLALibBinaryCellOp only visits
the stored non-zeros of the compressed operand, which produced wrong results
for several operation/format combinations exercised by CLALibBinaryCellOpTest:

- Gate the sparse-sparse fast path on operator sparse-safety w.r.t. the
  compressed operand's zeros (isRowSafeLeft/isRowSafeRight on the vector).
  Non-sparse-safe ops (plus, minus, or, xor, min/max, ...) now fall back to
  the all-columns sparse path that evaluates every cell.
- Index the temporary sparse block by row instead of a dense flat offset in
  the left path (was (row-rl)*nCol, causing wrong rows / out-of-bounds).
- Sort the temporary sparse rows after decompression so values are appended
  to the output in ascending column order (decompressing multiple column
  groups can leave unsorted column indices, swapping output columns).

Also rename the sparse task helpers from processDense* to processSparse*,
replace the DECOMPRESSION_BLEN 16384/2 expression with 8192, and correct the
asyncCompressLock comment to describe its actual global scope.
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

@Baunsgaard Baunsgaard merged commit e66f276 into apache:main Jun 28, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant