Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds repository guidance Markdown aimed at helping agentic coding tools (e.g., Claude Code, Codex) understand SpeechBrain’s structure, conventions, and workflows.
Changes:
- Added a new
AGENTS.mdguide describing project structure, core architecture concepts, recipe conventions, and common pitfalls. - Added
CLAUDE.mdintended to point Claude-based agents at the main instructions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
ps: I am not a prompt engineer, and therefore I believe that things could be improved (e.g. should it be more concise etc?) but I do believe that the only way to know is to start from somewhere and then slowly update/build on top. |
pplantinga
left a comment
There was a problem hiding this comment.
Looks like a good start, my main comment is that we may want to start thinking of specific agentic workflows and design special files for this, such as "adding a new feature to speechbrain core" which could be covered by an AGENTS.md file in the speechbrain/ folder with more detailed instructions on how to run tests and write unittests and ensure everything is working. Or a "how to write a new recipe" file for the recipes folder, etc. Not sure if it is necessary now but could be nice to do while we are thinking about it.
|
|
||
| ## Recipe conventions | ||
|
|
||
| Every recipe lives at `recipes/{dataset}/{task}/{mdeol}` and follows this structure: |
There was a problem hiding this comment.
| Every recipe lives at `recipes/{dataset}/{task}/{mdeol}` and follows this structure: | |
| Every recipe lives at `recipes/{dataset}/{task}/{model}` and follows this structure: |
| pytest tests/integration/ -x | ||
| ``` | ||
|
|
||
| Pre-commit hooks are configured in `.pre-commit-config.yaml` and enforce formatting/linting automatically. Always run `pre-commit run -a` before opening a PR. |
There was a problem hiding this comment.
Not sure how installing the pre-commit hooks interacts with agents here. Looks like this assumes the agent will always manually run the tests rather than installing the hook. Just wanted to check that this is what we want to do, as it seems there is some risk of the agent forgetting this part (which is not the end of the world but could be annoying I guess).
| - **Stage-dependent logic**: always check `stage` before computing validation-only metrics or applying train-only augmentation. Forgetting this causes training-time metric computation (slow) or test-time augmentation (wrong results). | ||
| - **Batch format**: batch objects from the dataio pipeline are `PaddedBatch` instances. Access signals as `batch.sig` which returns `(tensor, lengths)` tuples. Do not index batch like a plain dict. | ||
| - **Checkpointing**: Brain's checkpointer saves/loads modules, optimizers, schedulers, and epoch counters. If you add a new trainable module, register it with the checkpointer or it won't be saved/restored. | ||
| - **Soundfile vs torchaudio**: SpeechBrain is migrating audio I/O from torchaudio to soundfile. Use `speechbrain.dataio.dataio.read_audio` for reading audio, not raw torchaudio calls. |
There was a problem hiding this comment.
| - **Soundfile vs torchaudio**: SpeechBrain is migrating audio I/O from torchaudio to soundfile. Use `speechbrain.dataio.dataio.read_audio` for reading audio, not raw torchaudio calls. | |
| - **Soundfile vs torchaudio**: SpeechBrain uses a soundfile backend for audio I/O. Use `speechbrain.dataio.dataio.read_audio` for reading audio, not raw torchaudio calls. |
| - PyTorch (core) | ||
| - HyperPyYAML (`hyperpyyaml` package — SpeechBrain's extended YAML, separate repo at `speechbrain/HyperPyYAML`) | ||
| - soundfile (audio I/O) | ||
| - torchaudio (some legacy audio I/O, being phased out) |
There was a problem hiding this comment.
| - torchaudio (some legacy audio I/O, being phased out) | |
| - torchaudio (basic feature transforms, resampling, etc.) |
|
|
||
| Every recipe wires this together in a `dataio_prep(hparams)` function — follow this pattern for new recipes. | ||
|
|
||
| ## Recipe conventions |
There was a problem hiding this comment.
Should we actually have additional AGENTS.md files in key top-level folders as well? Like one for recipes, one for tests, one for speechbrain folder itself? Just wondering if it might be helpful to have more specific instructions depending on what the agent is trying to do.
|
|
||
| - **HyperPyYAML is not plain YAML**: do not treat `.yaml` files as simple config. `!new:` instantiates objects, `!ref` resolves references. Editing these files requires understanding the tag system. If you break a `!ref` chain, training will crash at load time. | ||
| - **Relative lengths, not absolute**: SpeechBrain passes relative lengths (0 to 1) for masking/padding. Do not pass absolute sample counts where relative lengths are expected. | ||
| - **modules vs hparams**: objects listed under `modules:` in the YAML are registered as `nn.Module`s on the Brain (moved to device, included in DDP, saved in checkpoints). Objects accessed via `self.hparams.*` are not. Putting a trainable module only in hparams means it won't be on the right device or saved properly. |
There was a problem hiding this comment.
The "saved in checkpoints" is unrelated to "modules" I think.

This PR adds markdown instructions for agentic models (e.g. Claude Code, Codex, etc.) to help them navigate the codebase.
These files are meant to evolve over time and gradually reflect the common issues LLMs may encounter when working with the SpeechBrain codebase. This first PR is intended as a prototype in that direction.