Your best training data is rotting in ~/.claude. cc-pushback mines every correction, interrupt, and rejected plan from your transcripts into judge-refined, TRL-ready SFT/DPO/KTO pairs on HuggingFace.
uvx cc-pushback scanOne pass over ~/.claude/projects fills ~/.cc-pushback/feedback.db with every correction you typed, every plan you rejected, and every inline review comment — conversational context included. stats shows what landed:
Driving with an agent? Paste this:
Run `uvx cc-pushback scan` to mine my Claude Code transcripts into ~/.cc-pushback/feedback.db.
Then run `uvx cc-pushback stats` and report how much pushback was collected per source kind.
Docs: https://yasyf.github.io/cc-pushback/
You've spent months telling Claude "no, not like that", and that signal evaporates as transcripts rotate out. Judge the corpus, then distill the accepted events into atomic pairs grounded in the code they complain about:
uvx cc-pushback triage
uvx cc-pushback refine
uvx cc-pushback enrichuvx cc-pushback pairs prints the deliverable: training pairs distilled from your own pushback, each carrying the conversational window and code evidence behind it.
Your taste is mostly tacit — you notice a rule when it's violated. The corpus makes it legible:
uvx cc-pushback list --source plan_reviewOn my machine the split is 698 mid-session corrections, 219 rejected plans, 41 review comments, and 22 interrupts. Another machine's history folds in too: mirror it with rsync and point scan --transcripts at it (repeatable, so several mirrors fold into one scan).
A judged corpus in SQLite trains nothing. Export projects it into TRL-ready configs:
uvx cc-pushback export --pushtraces: train 1156 test 115
sft: train 499 test 67
dpo: train 363 test 44
kto: train 1156 test 115
Four configs land as per-split parquet in a private <hf-user>/cc-pushback-traces, next to a generated dataset card. Splits group on the session hash, so a session never straddles train and test.
- Incremental scanning — content digests and one-transaction commits make re-scans cheap and interrupt-safe — scan
- Judge, audit, eval — prompt-versioned triage, a seeded audit sample, and mechanical metrics with no LLM calls — triage
- Pair dashboard — browse refined pairs and their full lineage in a local web UI — view-samples
- Python API — drive the scanner and the feedback store from your own code — reference
Status: alpha — the pipeline runs end to end; the judge prompt still iterates (v5 today).
Read the docs for the full guide. Licensed under PolyForm Noncommercial 1.0.0.


