Fix LoRA adapter support for convolutional layers (fixes #3056) by oliver0006 · Pull Request #3064 · speechbrain/speechbrain · GitHub
Skip to content

Fix LoRA adapter support for convolutional layers (fixes #3056)#3064

Open
oliver0006 wants to merge 1 commit into
speechbrain:developfrom
oliver0006:fix/lora-conv-support
Open

Fix LoRA adapter support for convolutional layers (fixes #3056)#3064
oliver0006 wants to merge 1 commit into
speechbrain:developfrom
oliver0006:fix/lora-conv-support

Conversation

@oliver0006

Copy link
Copy Markdown

What this does

Fixes #3056

The LoRA adapter class claimed to support nn.Conv layers but assumed nn.Linear semantics, crashing with RuntimeError: mat1 and mat2 shapes cannot be multiplied on any convolutional layer (e.g. the wav2vec2-style repro in the issue).

This PR implements real convolutional support instead of removing the claim, following the approach used by HF peft:

  • Down projection: a conv of the same type mirroring the pretrained layer's geometry (kernel_size, stride, padding, dilation, padding_mode), mapping onto rank channels — so both branches produce outputs of identical shape
  • Up projection: a pointwise (1×1) conv from rank to out_channels, zero-initialized (standard LoRA init)
  • Grouped convolutions (groups != 1): clear ValueError instead of a cryptic crash
  • No behavior change for nn.Linear (and other weight-matrix modules); checkpoint attribute names (adapter_down_proj/adapter_up_proj) are preserved so existing checkpoints keep loading

Testing

New tests/unittests/test_adapters.py with 9 tests verifying the core LoRA properties:

  • Identity at init (zero up-projection ⇒ adapted output == pretrained output), for Linear, Conv1d, Conv2d and Conv3d
  • Exact match with the manual formula base(x) + up(down(x)) * alpha/rank
  • Pretrained weights frozen (no grads) while adapter weights receive gradients and actually learn (one SGD step changes the output, pretrained output unchanged)
  • Shape equality across geometries (stride=5, padding='same', dilation=2)
  • groups != 1 raises ValueError
  • The exact AdaptedModel(all_conv=True) repro from the issue, forward + backward, with only adapter params trainable

Also added a Conv1d doctest to the LoRA docstring. All 9 tests + 3 doctests pass; ruff check, ruff format and codespell pass on the changed files.


Co-written by a human (@oliver0006) and Claude AI (Fable 5) working together.

🤖 Generated with Claude Code

The LoRA class claimed to work with nn.Conv layers but assumed
nn.Linear semantics (weight shape and linear projections), crashing
on any convolutional layer. The adapter projections now mirror the
geometry of the pretrained convolution (following the approach of HF
peft): the down projection reuses kernel/stride/padding/dilation to
map onto rank channels, and the up projection is a pointwise
convolution initialized to zero. Grouped convolutions raise a clear
error. Linear behavior and checkpoint attribute names are unchanged.

Fixes speechbrain#3056

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRA adapters not working on convolutional layers

1 participant