Sunbelt Computer Software

AI Audio Processing Pipeline

An AI-powered pipeline for processing audio files to extract dialogue, perform speaker diarization, character attribution, and generate summaries.

Created to extract dialog and summarize Dungeons & Dragons gameplay sessions, but applicable to similarly-themed workflows.

Features

Audio Transcription: Convert audio files to text using Whisper
Speaker Diarization: Identify different speakers in audio
Dialogue Alignment: Merge transcripts with speaker information
Character Attribution: Map dialogue lines to characters using LLM
Summarization: Generate scene summaries and beat sheets
Vector Search: Index and query processed content

Project Structure

├─ README.md
├─ .env                         # tokens + config
├─ requirements.txt
├─ data/
│  ├─ audio/                    # drop WAV/MP3 here
│  ├─ transcripts/              # whisper JSON + TXT
│  ├─ diarization/              # speaker turns (RTTM/JSON)
│  ├─ aligned/                  # transcript merged with speakers
│  ├─ attributed/               # character-attributed dialogue
│  └─ summaries/                # scene summaries/beat sheets
├─ chroma/                      # vector store
├─ app/
│  ├─ cli.py                    # Typer CLI entrypoint
│  ├─ asr_whisper.py            # transcription
│  ├─ diarize.py                # speaker diarization
│  ├─ align.py                  # align ASR segments ↔ speakers
│  ├─ attribute.py              # map lines to Characters via LLM
│  ├─ summarize.py              # scene/episode summaries
│  ├─ embed_index.py            # Chroma ingest + query
│  ├─ prompts.py                # prompt templates
│  └─ utils.py                  # ffmpeg, io helpers, chunking

Setup

Prerequisites

FFmpeg (required for audio processing):

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

Ollama (required for LLM inference):

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the model (20B parameter model recommended)
ollama pull gpt-oss:20b

Hugging Face Token (required for speaker diarization):
- Create account at https://huggingface.co
- Get a read-only API token from your settings
- Accept the license for pyannote/speaker-diarization-3.1

Installation

Establish a virtual environment:

python3 -m venv ~/starfire_venv
source ~/starfire_venv/bin/activate
pip install -r requirements.txt

Copy .env-default into a new file, .env, then configure environment variables:
```
cp .env-default .env
# Edit .env and set your HF_TOKEN
```
Place audio files in data/audio/

Audio Preparation

If you have MP3 files, convert them to the required WAV format:

# Batch convert all MP3 files to WAV (16kHz mono)
for f in data/audio/*.mp3; do
  ffmpeg -i "$f" -ac 1 -ar 16000 -c:a pcm_s16le "${f%.mp3}.wav"
done

Usage

Run the CLI tool:

python app/cli.py --help

Complete Workflow

Process a D&D session from start to finish:

# 1. Transcribe audio to text
python -m app.cli transcribe data/audio/Session_090123_01.wav

# 2. Identify speakers in the audio
python -m app.cli diarize data/audio/Session_090123_01.wav

# 3. Align transcript with speaker information
python -m app.cli align Session01

Create a roster.json file to map speakers to characters:

{
  "dm": "Luke (DM)",
  "players": [
    {"name": "Jerome", "character": "Aguiar", "notes": "Human fighter"},
    {"name": "Nancy", "character": "Juniper", "notes": "Elf magic user"},
    {"name": "Chris", "character": "Starble", "notes": "Dwarf fighter"}
  ],
  "known_npcs": [
    {"name":"Glade", "notes":"Member of the party but controlled by the DM"},
    {"name":"Starla", "notes":"High charisma, older human female, innate storyteller character, lives outside of Drexville"}
  ],
  "tone":"Low magic, survival-focused campaign"
}

# 4. Attribute dialogue lines to characters
python -m app.cli attribute Session01 roster.json

# 5. Generate scene summaries
python -m app.cli summarize Session01

# 6. Index content for vector search
python -m app.cli index Session01

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Audio Processing Pipeline

Features

Project Structure

Setup

Prerequisites

Installation

Audio Preparation

Usage

Complete Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
.env-default		.env-default
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
requirements.txt		requirements.txt
roster.json		roster.json

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

AI Audio Processing Pipeline

Features

Project Structure

Setup

Prerequisites

Installation

Audio Preparation

Usage

Complete Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages