TuneOCR is a stable, production-ready, open-source framework for fine-tuning OCR models. It helps researchers, developers, and hobbyists train and adapt small OCR models more effectively across diverse document types and languages while keeping compute and cost practical.
TuneOCR provides reproducible training pipelines, dataset utilities, and helpers for flexible token / special-token positioning so you can experiment with language tags, transcription prompts, and label-formatting strategies without modifying model internals.
Key capabilities:
- Fine-tune a variety of compact OCR and document-understanding models.
- Reformat or inject special tokens (e.g., language tags) in training targets.
- Simple data collators and training scripts ready for CPU/GPU or small cloud instances.
- Extensible plugin/back-end system for new model integrations.
- Built-in deterministic runs and logging to reproduce experiments.
TuneOCR ships with integrations or examples for the following architectures:
- TrOCR — Single-line handwriting and printed text.
- Donut — Document-to-JSON extraction (receipts, invoices, structured forms).
- VLMs — Visual-linguistic models for complex document QA and understanding.
- QWEN OCR — Multi-language, high-capacity OCR models.
- Nanonets OCR — Lightweight, fast OCR suitable for scanned forms and field capture.
- OlmOCR — Flexible research-oriented OCR framework for experimentation.
If a backend you need is missing, TuneOCR is designed so you can add one quickly (processor, model wrapper, data prep).
- Open source & community-first — Inspect, reuse, and extend the training recipes.
- Small-model focused — Emphasizes parameter-efficient approaches (LoRA/adapters, selective fine-tuning) and practical defaults so you can iterate on modest hardware.
- Flexible label formatting — Move language/task tokens (e.g.,
<|en|>,<|transcribe|>) to start, middle, or end of targets to test what works for your data. - Reproducible experiments — Deterministic seeds, config-driven runs, and logging for fair comparisons.
- Practical evaluations — WER / CER and structured extraction checks (for Donut-like workflows).
Python 3.11 recommended
pip install -r requirements.txtContributors are super welcome! Help the project grow by opening issues and submitting PRs.
How to contribute:
Open an issue for bugs, feature requests, or new backend proposals. Include:
-
Problem statement / feature description
-
Minimal repro or sample dataset (if applicable)
-
Expected behavior or desired API
Create a Pull Request (PR) for fixes, features, docs, or new backends. PR checklist we appreciate:
-
Descriptive title and summary of the change
-
Tests where appropriate (unit or small integration)
-
Updated CHANGELOG.md and documentation for visible changes
Use feature branches named like feature/qwen-backend or fix/token-positioning-bug.
All contributors are welcome. To be acknowledged:
-
Add yourself to CONTRIBUTORS.md via PR, or open an issue to request inclusion.
-
Major contributions will be noted in CHANGELOG.md and release notes.
- Author / Maintainer: Emkay Nguyen
- Email: minhkhoinguyendo1210@gmail.com
For coordination, partnership, or academic collaboration, open an issue or email the maintainer.
