Deterministic, replayable, ablation-friendly multimodal labeling and synthetic-data pipeline built on Ray + vLLM.
- Multimodal Labeling: VLM-based image captioning, attribute tagging, and text classification
- Deterministic Pipelines: Prompt/model pinning, stable row IDs, and explicit seed management
- Replayable Runs: Content-addressed caching and comprehensive manifests for full reproducibility
- Ablation-Friendly: Matrix experiments with shared caches and run comparison tools
- Rubric Scoring: Large-batch structured evaluation with guided decoding
- Hard-Negative Mining: Embedding + reranking pipeline for high-quality training data
- Synthetic Data Generation: Text and VLM-grounded conversation synthesis with quality filtering
LabelForge uses Ray Data LLM (build_processor + vLLMEngineProcessorConfig) as the inference backbone, providing:
- Efficient batch inference across multiple GPUs
- Per-row sampling parameters for ablations
- Native VLM support with PIL image inputs
- Guided decoding for structured JSON outputs
labelforge/
├── core/ # Schemas, hashing, seeds, environment capture
├── io/ # Dataset I/O, JSONL manifests, images
├── llm/ # Ray Data LLM processor factory, determinism toggles
├── pipelines/ # Stage abstraction, DAG, runner
├── cache/ # Content-addressed row/stage caching
├── mining/ # Hard-negative candidate generation and selection
├── synth/ # Synthetic data generation and deduplication
├── eval/ # Score normalization and metrics
└── cli/ # Command-line interface
# Basic installation
pip install labelforge
# With S3 cache backend
pip install labelforge[s3]
# Development
pip install -e ".[dev]"# Run a labeling pipeline
labelforge run --config configs/mvp.yaml
# Replay a previous run
labelforge replay --manifest runs/<run_id>/manifest.jsonl
# Compare two runs
labelforge diff runs/<run_a> runs/<run_b>
# Inspect run artifacts
labelforge inspect runs/<run_id>LabelForge provides two determinism modes:
-
Standard Mode (default): Uses
VLLM_BATCH_INVARIANT=1for scheduling-insensitive outputs. Best throughput with reproducible results. -
Strict Mode: Additionally enables Ray Data
preserve_orderand disables vLLM multiprocessing. Maximum reproducibility at the cost of throughput.
- Same code revision and config
- Pinned prompt pack version
- Pinned model revision
- Fixed seeds
- Same hardware profile (GPU type, count)
- Same Ray + vLLM versions
See docs/determinism.md for detailed caveats.
- Architecture Overview
- Run Contract
- I/O Layout
- Determinism Caveats
- Adding New Stages
- Running Ablations
- Performance Tuning
This software is proprietary under a Portfolio/Research-Only License.
- No commercial or professional use permitted
- Research use requires citation — see CITATION.cff
- No external contributions accepted
See LICENSE for full terms.
