GitHub - troyscott/get-transcripts: Transcribe video/audio files and YouTube URLs locally using Whisper on Apple Silicon (MPS). CLI + Python module. Supports all Whisper models. · GitHub
Skip to content

troyscott/get-transcripts

Folders and files

Repository files navigation

get-transcripts

Transcribe video files, audio files, and YouTube URLs locally using OpenAI Whisper on Apple Silicon — no cloud, no API keys, no cost per minute.

Optimised for Mac Mini M4 / MacBook Pro M-series using the MPS (Metal Performance Shaders) backend for GPU-accelerated transcription. A 30-minute video transcribes in ~8 minutes on M4 with the medium model.

Features

  • Local & private — audio never leaves your machine
  • YouTube support — paste a URL and get a transcript directly (via yt-dlp)
  • All Whisper modelstiny through large-v3, selectable with --model
  • 5 output formats.txt, .vtt, .srt, .tsv, .json generated in one pass
  • Reusable Python module — import src.transcribe in your own scripts
  • Shell script wrapper — one command, no Python required
  • 30 tests — unit + integration, CI-friendly

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • micromamba for Python environment management
  • Homebrew for system packages
  • ffmpeg — installed automatically in the setup steps below

Setup

# 1. Install ffmpeg
brew install ffmpeg

# 2. Clone the repo
git clone https://github.com/troyscott/get-transcripts.git
cd get-transcripts

# 3. Create the Python environment
micromamba env create -f environment.yml
micromamba activate transcribe

# 4. Pre-download the default model (~1.5 GB, one-time — cached to ~/.cache/whisper/)
python -c "import whisper; whisper.load_model('medium')"

Verify your GPU is available:

python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"
# Expected: MPS available: True

Usage

Shell script (quickest)

# Local file (video or audio)
./scripts/run.sh demo.mp4

# YouTube URL
./scripts/run.sh https://youtu.be/dQw4w9WgXcQ

# Different model
./scripts/run.sh demo.mp4 --model large-v3

Python CLI (more options)

micromamba activate transcribe

# Local file
python -m src.cli demo.mp4

# YouTube URL
python -m src.cli https://youtu.be/dQw4w9WgXcQ --model small

# All options
python -m src.cli demo.mp4 --model medium --device mps --output-dir ./out --language en

As a Python module

from pathlib import Path
from src.transcribe import run

result = run(
    "demo.mp4",                  # or a YouTube URL
    model_name="medium",
    device="mps",
    output_dir=Path("./output"),
)
print(result["text"])

Available models

Model Size Speed (M4 MPS) Best for
tiny 75 MB ~1 min/hr Testing
base 145 MB ~2 min/hr Fast drafts
small 460 MB ~4 min/hr Good accuracy
medium 1.5 GB ~8 min/hr Default
large-v3 3 GB ~16 min/hr Best accuracy

Outputs are written to ./output/:

File Description
.txt Plain text transcript
.vtt Timestamped captions (WebVTT)
.srt Timestamped captions (SubRip)
.tsv Tab-separated with timestamps
.json Full Whisper output with metadata

Running tests

# Unit tests only (fast, no model download required)
micromamba run -n transcribe pytest tests/unit/ -v

# Full integration test (downloads tiny model ~75 MB on first run)
micromamba run -n transcribe pytest tests/ -v --run-integration

Common issues

MPS available: False — Requires Apple Silicon + macOS 12.3+. Fall back to --device cpu in scripts/run.sh (slower).

Hallucinated captions during silence — Trim silent sections with ffmpeg before transcribing:

ffmpeg -i in.mp4 -ss 00:00:05 -to 00:00:-05 -c copy trimmed.mp4

Domain terms mis-transcribed (e.g. ASTM → "ASTAM") — Do a find-and-replace pass on the .vtt output before use.

Project structure

get-transcripts/
├── src/
│   ├── transcribe.py       # Core module (extract, download, transcribe, run)
│   └── cli.py              # Argparse CLI entry point
├── scripts/
│   └── run.sh              # One-shot shell wrapper
├── tests/
│   ├── conftest.py         # Shared fixtures
│   ├── unit/
│   │   ├── test_audio.py              # ffmpeg extraction tests
│   │   └── test_transcribe_module.py  # Module unit + error path tests
│   └── integration/
│       └── test_transcription.py      # Full pipeline tests (tiny model)
├── environment.yml         # micromamba environment
├── pyproject.toml          # Project config + ruff rules
├── CLAUDE.md               # AI assistant context
└── SPEC.md                 # Design document

About

Transcribe video/audio files and YouTube URLs locally using Whisper on Apple Silicon (MPS). CLI + Python module. Supports all Whisper models.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors