Fix/5614 grouped conv cpu lowering by alisterburt · Pull Request #6710 · modular/modular · GitHub
Skip to content

Fix/5614 grouped conv cpu lowering#6710

Open
alisterburt wants to merge 2 commits into
modular:mainfrom
alisterburt:fix/5614-grouped-conv-cpu-lowering
Open

Fix/5614 grouped conv cpu lowering#6710
alisterburt wants to merge 2 commits into
modular:mainfrom
alisterburt:fix/5614-grouped-conv-cpu-lowering

Conversation

@alisterburt

Copy link
Copy Markdown

Assisted-by: Claude Opus 4.8

Linked issue

Fixes #5614 (PR opened without pre-approval as seems trivial? happy to close and discuss if preferred)

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • Performance improvement (includes benchmark results below)
  • Documentation update
  • New feature or public API (requires prior proposal or issue approval)
  • Refactor / internal cleanup (no user-visible change)
  • Build, CI, or tooling change

Motivation

MAX models with grouped convs would fail to compile despite the kernels supporting grouped conv

repro:

import os
from importlib.metadata import version

from max.driver import CPU
from max.dtype import DType
from max.engine import InferenceSession
from max.graph import DeviceRef, Graph, TensorType, ops

os.environ.setdefault("MODULAR_MAX_DEBUG", "True")
print(f"modular = {version('modular')},  max = {version('max')}, mojo = {version('mojo')}\n")

dev = DeviceRef.CPU()
x = TensorType(DType.float32, [1, 8, 8, 4], device=dev)        # NHWC
f = TensorType(DType.float32, [3, 3, 2, 4], device=dev)        # RSCF
with Graph("g", input_types=[x, f]) as g:
    xi, fi = g.inputs
    g.output(ops.conv2d(xi, fi, padding=(1, 1, 1, 1), groups=2))
InferenceSession(devices=[CPU()]).load(g)   # <-- raises "Failed to compile the model"
uv run --with modular==26.4.0 --prerelease allow repro_5614_tiny.py

What changed

  • pack the filter before passing to the conv kernel
  • added a test case to the existing test, note that this test is currently skipped

Testing

I have not built max against the contents of this PR, I don't think this is currently possible with what has been open sourced?

To prove the fix I wrote a small custom op that does exactly
what the grouped CPU lowering should do — pack_conv_filter_shapepack_filter
(RSCF→FRSCf) → ConvDirectNHWC[..., filter_packed=True].run(...) — and called it
from the graph API. It compiles and matches a NumPy grouped-conv reference:

input NHWC = (1, 8, 8, 8), out_channels = 8, kernel = 3x3, groups = 4
[OK] grouped_conv2d custom op compiled + ran -> output shape (1, 8, 8, 8)
     max abs diff vs NumPy grouped-conv reference: 1.49e-07
     equivalent: True

Checklist

  • The linked issue above has been reviewed by a maintainer and is
    agreed-upon, or this is a trivial fix that does not need prior
    approval
  • PR is small and focused — I've split larger changes into a sequence of
    smaller PRs where possible (see
    pull request sizes)
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an
    Assisted-by: trailer in my commit message or this PR description (see
    AI Tool Use Policy)

tried to run ./bazelw run format but hit some git error, I think due to my branch naming convention differing from what's expected?

@BradLarson

Copy link
Copy Markdown
Member

@alisterburt

Copy link
Copy Markdown
Author

@BradLarson thanks for the tip! I had to work around a few little issues to run the test via bazel on my mac but have now confirmed the model successfully compiles in the test after this PR where it previously failed.

Note the test is still skioped so wouldn't yet make a regression visible in CI. I considered a separate test but was aiming to keep changes minimal :-)

@tboerstad

tboerstad commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Thanks for helping out!

I have an internal PR up now which removes the skip for this test, I'll try to get that out in the next nightly release.
After that we're in a good shape to merge this PR.

Edit: It's been merged

@tboerstad

Copy link
Copy Markdown
Contributor

modularbot pushed a commit that referenced this pull request Jun 24, 2026
reference

The conv2d accuracy test was skipped due to an unknown accuracy.
This was the reference using TF32 instead of F32. This PR changes the
reference to TF32 and enables the test.
This is in preparation for #6710.

MODULAR_ORIG_COMMIT_REV_ID: 4ffd7bf95d708269cfaf212ac419b34ce3761991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Cannot use grouped convolutions in graph mode

3 participants