Transcribe video files, audio files, and YouTube URLs locally using OpenAI Whisper on Apple Silicon — no cloud, no API keys, no cost per minute.
Optimised for Mac Mini M4 / MacBook Pro M-series using the MPS (Metal Performance Shaders) backend for GPU-accelerated transcription. A 30-minute video transcribes in ~8 minutes on M4 with the medium model.
- Local & private — audio never leaves your machine
- YouTube support — paste a URL and get a transcript directly (via
yt-dlp) - All Whisper models —
tinythroughlarge-v3, selectable with--model - 5 output formats —
.txt,.vtt,.srt,.tsv,.jsongenerated in one pass - Reusable Python module — import
src.transcribein your own scripts - Shell script wrapper — one command, no Python required
- 30 tests — unit + integration, CI-friendly
- macOS with Apple Silicon (M1/M2/M3/M4)
- micromamba for Python environment management
- Homebrew for system packages
- ffmpeg — installed automatically in the setup steps below
# 1. Install ffmpeg
brew install ffmpeg
# 2. Clone the repo
git clone https://github.com/troyscott/get-transcripts.git
cd get-transcripts
# 3. Create the Python environment
micromamba env create -f environment.yml
micromamba activate transcribe
# 4. Pre-download the default model (~1.5 GB, one-time — cached to ~/.cache/whisper/)
python -c "import whisper; whisper.load_model('medium')"Verify your GPU is available:
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"
# Expected: MPS available: True# Local file (video or audio)
./scripts/run.sh demo.mp4
# YouTube URL
./scripts/run.sh https://youtu.be/dQw4w9WgXcQ
# Different model
./scripts/run.sh demo.mp4 --model large-v3micromamba activate transcribe
# Local file
python -m src.cli demo.mp4
# YouTube URL
python -m src.cli https://youtu.be/dQw4w9WgXcQ --model small
# All options
python -m src.cli demo.mp4 --model medium --device mps --output-dir ./out --language enfrom pathlib import Path
from src.transcribe import run
result = run(
"demo.mp4", # or a YouTube URL
model_name="medium",
device="mps",
output_dir=Path("./output"),
)
print(result["text"])| Model | Size | Speed (M4 MPS) | Best for |
|---|---|---|---|
tiny |
75 MB | ~1 min/hr | Testing |
base |
145 MB | ~2 min/hr | Fast drafts |
small |
460 MB | ~4 min/hr | Good accuracy |
medium |
1.5 GB | ~8 min/hr | Default |
large-v3 |
3 GB | ~16 min/hr | Best accuracy |
Outputs are written to ./output/:
# Unit tests only (fast, no model download required)
micromamba run -n transcribe pytest tests/unit/ -v
# Full integration test (downloads tiny model ~75 MB on first run)
micromamba run -n transcribe pytest tests/ -v --run-integrationMPS available: False — Requires Apple Silicon + macOS 12.3+. Fall back to --device cpu in scripts/run.sh (slower).
Hallucinated captions during silence — Trim silent sections with ffmpeg before transcribing:
ffmpeg -i in.mp4 -ss 00:00:05 -to 00:00:-05 -c copy trimmed.mp4Domain terms mis-transcribed (e.g. ASTM → "ASTAM") — Do a find-and-replace pass on the .vtt output before use.
get-transcripts/
├── src/
│ ├── transcribe.py # Core module (extract, download, transcribe, run)
│ └── cli.py # Argparse CLI entry point
├── scripts/
│ └── run.sh # One-shot shell wrapper
├── tests/
│ ├── conftest.py # Shared fixtures
│ ├── unit/
│ │ ├── test_audio.py # ffmpeg extraction tests
│ │ └── test_transcribe_module.py # Module unit + error path tests
│ └── integration/
│ └── test_transcription.py # Full pipeline tests (tiny model)
├── environment.yml # micromamba environment
├── pyproject.toml # Project config + ruff rules
├── CLAUDE.md # AI assistant context
└── SPEC.md # Design document
