The Operating System for AI Experiments.
Turn every experiment into a standardized, reproducible asset — run, track, compare, and analyze with one workflow.
TraceOS is a lightweight experiment runtime system for AI research. It unifies the full lifecycle of an experiment:
- Run experiments
- Track execution history
- Generate structured reports
- Analyze results (capabilities, failures, recommendations)
- Compare runs side by side
Every experiment becomes a standardized, traceable asset.
git clone https://github.com/aaa-mvc/traceos.git
cd traceos
pip install -e .
experiment run configs/mock_bottles.yamlNo GPU required. You will see:
TraceOS v0.1.0
Experiment: mock-bottles-v1
Run: run_9392e4aecb07
Plugin: mock
[1/4] Preparing dataset... Done
[2/4] Training... Completed
[3/4] Evaluating... 85.0% success
[4/4] Generating report... Done
Done. Run: run_9392e4aecb07
What you got:
report.html -- outputs/run_9392e4aecb07/report.html
analysis.json -- outputs/run_9392e4aecb07/analysis.json
events.jsonl -- outputs/run_9392e4aecb07/events.jsonl (17 lifecycle events)
Why this matters:
experiment runs list # All your experiments, searchable
experiment compare <id1> <id2> # Compare any two runs side by side
experiment analyze <run-id> # Capability scores + failure analysis + recommendations
experiment analyze <run-id>Example analysis.json:
{
"capability": {
"cap_success": { "label": "Success Rate", "value": 0.85 },
"cap_precision": { "label": "Precision", "value": 0.765 },
"cap_speed": { "label": "Speed", "value": 0.636 },
"cap_robustness":{ "label": "Robustness", "value": 0.75 }
},
"failure": {
"total_failures": 1,
"categories": { "grasp_failure": 1 }
},
"recommendations": [
{
"priority": "high",
"description": "Success rate is 85.0%. 1/5 episodes failed. Consider increasing training data.",
"evidence": "success_rate=0.85, failures=1"
}
]
}experiment compare <run-id-1> <run-id-2>Metric Baseline Current Delta Winner
----------------------------------------------------------
Success Rate 0.850 0.850 0.0% tie
Precision 0.765 0.765 0.0% tie
Speed 0.636 0.636 0.0% tie
Robustness 0.750 0.750 0.0% tie
Every experiment produces a standardized artifact directory:
outputs/<run-id>/
├── report.html # Self-contained HTML report
├── analysis.json # Structured analysis (RFC-0005)
├── events.jsonl # Full execution trace (17+ events)
├── experiment.json # Frozen config snapshot
├── artifacts.json # Output index
├── train/
│ ├── checkpoint/last.pt
│ ├── metrics.jsonl
│ └── stdout.log
└── eval/
└── summary.json
CLI (10 commands)
↓
Runner (lifecycle + event emission)
↓
Adapter Layer (ABC / Mock / Dummy)
↓
Artifact Layer (standard outputs + registry)
↓
Analysis Engine (capability / failure / compare / recommend)
| Without TraceOS | With TraceOS |
|---|---|
| Scripts scattered across directories | One command per experiment |
| Results hard to compare | experiment compare built in |
| No execution trace | Full event log per run |
| Manual analysis | Automated capability + failure analysis |
| No run history | Searchable registry |
| Adapter | Config | GPU | Notes |
|---|---|---|---|
| mock | configs/mock_bottles.yaml |
No | Instant demo, deterministic |
| dummy | configs/dummy.yaml |
No | Minimal SDK example (45 lines) |
| abc | configs/bottle.yaml |
Yes (8x) | Full ABC-130K DiT pipeline |
TraceOS capability analysis follows the Capability Schema Spec — a lightweight standard for defining what a capability is, how metrics witness it, and how evidence supports scores. The schema is consumed by schema_adapter.py (transformation-only, no runtime participation).
- entry_points plugin discovery
- OpenVLA / ACT / Pi0 adapters
- Multi-domain capability schemas (agent, ocr, llm)
- Improved failure analysis accuracy
- Cloud runner abstraction
TraceOS was inspired by and initially built as an experiment wrapper for the ABC-130K project.
- Paper: Scalable Behavior Cloning with Open Data, Training, and Evaluation — Allshire et al., arXiv 2606.27375 (2026)
- Repository: amazon-far/abc
- Dataset: XDOF/ABC-130k
The ABC Adapter wraps the original scripts via subprocess — zero lines of ABC code are modified. TraceOS is not affiliated with the ABC authors or Amazon.
Apache 2.0
