iframe-proxy

alisterburt · 2026-06-23T00:50:56Z

Assisted-by: Claude Opus 4.8

Linked issue

Fixes #5614 (PR opened without pre-approval as seems trivial? happy to close and discuss if preferred)

Type of change

Bug fix (non-breaking change that fixes an issue)
Performance improvement (includes benchmark results below)
Documentation update
New feature or public API (requires prior proposal or issue approval)
Refactor / internal cleanup (no user-visible change)
Build, CI, or tooling change

Motivation

MAX models with grouped convs would fail to compile despite the kernels supporting grouped conv

repro:

import os
from importlib.metadata import version

from max.driver import CPU
from max.dtype import DType
from max.engine import InferenceSession
from max.graph import DeviceRef, Graph, TensorType, ops

os.environ.setdefault("MODULAR_MAX_DEBUG", "True")
print(f"modular = {version('modular')},  max = {version('max')}, mojo = {version('mojo')}\n")

dev = DeviceRef.CPU()
x = TensorType(DType.float32, [1, 8, 8, 4], device=dev)        # NHWC
f = TensorType(DType.float32, [3, 3, 2, 4], device=dev)        # RSCF
with Graph("g", input_types=[x, f]) as g:
    xi, fi = g.inputs
    g.output(ops.conv2d(xi, fi, padding=(1, 1, 1, 1), groups=2))
InferenceSession(devices=[CPU()]).load(g)   # <-- raises "Failed to compile the model"

uv run --with modular==26.4.0 --prerelease allow repro_5614_tiny.py

What changed

pack the filter before passing to the conv kernel
added a test case to the existing test, note that this test is currently skipped

Testing

I have not built max against the contents of this PR, I don't think this is currently possible with what has been open sourced?

To prove the fix I wrote a small custom op that does exactly
what the grouped CPU lowering should do — pack_conv_filter_shape → pack_filter
(RSCF→FRSCf) → ConvDirectNHWC[..., filter_packed=True].run(...) — and called it
from the graph API. It compiles and matches a NumPy grouped-conv reference:

input NHWC = (1, 8, 8, 8), out_channels = 8, kernel = 3x3, groups = 4
[OK] grouped_conv2d custom op compiled + ran -> output shape (1, 8, 8, 8)
     max abs diff vs NumPy grouped-conv reference: 1.49e-07
     equivalent: True

Checklist

The linked issue above has been reviewed by a maintainer and is
agreed-upon, or this is a trivial fix that does not need prior
approval
PR is small and focused — I've split larger changes into a sequence of
smaller PRs where possible (see
pull request sizes)
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description (see
AI Tool Use Policy)

tried to run ./bazelw run format but hit some git error, I think due to my branch naming convention differing from what's expected?

BradLarson · 2026-06-23T02:43:25Z

alisterburt · 2026-06-23T05:43:22Z

@BradLarson thanks for the tip! I had to work around a few little issues to run the test via bazel on my mac but have now confirmed the model successfully compiles in the test after this PR where it previously failed.

Note the test is still skioped so wouldn't yet make a regression visible in CI. I considered a separate test but was aiming to keep changes minimal :-)

tboerstad · 2026-06-23T07:45:00Z

Thanks for helping out!

I have an internal PR up now which removes the skip for this test, I'll try to get that out in the next nightly release.
After that we're in a good shape to merge this PR.

Edit: It's been merged

tboerstad · 2026-06-23T08:15:12Z

reference The conv2d accuracy test was skipped due to an unknown accuracy. This was the reference using TF32 instead of F32. This PR changes the reference to TF32 and enables the test. This is in preparation for #6710. MODULAR_ORIG_COMMIT_REV_ID: 4ffd7bf95d708269cfaf212ac419b34ce3761991

alisterburt added 2 commits June 22, 2026 15:42

fix grouped convolution (gh issue modular#5614) on CPU by packing filter

155d359

add grouped code path to test_conv.py

1b4ebae

alisterburt requested review from a team as code owners June 23, 2026 00:50

github-actions Bot added the waiting-on-review label Jun 23, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/5614 grouped conv cpu lowering#6710

Fix/5614 grouped conv cpu lowering#6710
alisterburt wants to merge 2 commits into
modular:mainfrom
alisterburt:fix/5614-grouped-conv-cpu-lowering

alisterburt commented Jun 23, 2026

Uh oh!

BradLarson commented Jun 23, 2026

Uh oh!

alisterburt commented Jun 23, 2026

Uh oh!

tboerstad commented Jun 23, 2026 •

edited

Loading

Uh oh!

tboerstad commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alisterburt commented Jun 23, 2026

Linked issue

Type of change

Motivation

What changed

Testing

Checklist

Uh oh!

BradLarson commented Jun 23, 2026

Uh oh!

alisterburt commented Jun 23, 2026

Uh oh!

tboerstad commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tboerstad commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tboerstad commented Jun 23, 2026 •

edited

Loading