GitHub - waycodes/embodied-agent: A dataset compiler that converts heterogeneous robot trajectories (RLDS, LeRobot, Open X-Embodiment) into a streamable storage format with strict validation and sliceable metadata. · GitHub
Skip to content

waycodes/embodied-agent

Repository files navigation

EmbodiedDataKit

A dataset compiler that converts heterogeneous robot trajectories (RLDS, LeRobot, Open X-Embodiment) into a streamable storage format with strict validation and sliceable metadata.

Features

  • Multi-format ingestion: Load from RLDS/TFDS, LeRobotDataset v3, Open X-Embodiment
  • Canonical representation: RLDS-aligned Episode/Step structure
  • Strict validation: Structural, semantic, and statistical checks
  • LeRobotDataset v3 output: Parquet + MP4 + relational metadata for Hub streaming
  • RLDS/TFDS export: Secondary format for ecosystem compatibility
  • Sliceable metadata: Query and filter by task, robot, episode properties
  • Cross-embodiment transforms: Camera selection, action mapping, normalization

Installation

# Core installation
pip install embodied-datakit

# With TensorFlow Datasets support (for OXE)
pip install embodied-datakit[tfds]

# With video encoding
pip install embodied-datakit[video]

# Full installation
pip install embodied-datakit[all]

Quick Start

# Probe a dataset
edk ingest oxe://berkeley_autolab_ur5

# Compile to LeRobotDataset v3
edk compile oxe://berkeley_autolab_ur5 --split "train[:100]" -o ./compiled/ur5

# Inspect the result
edk inspect ./compiled/ur5

# Export to RLDS format
edk export-rlds ./compiled/ur5 -o ./rlds/ur5

Documentation

See docs/ for detailed documentation:

License

Proprietary – Research/Portfolio Use Only

This software is provided under a restrictive proprietary license:

  • ❌ No commercial or professional use
  • ❌ No redistribution or derivative works
  • ✅ Non-commercial research with required citation
  • ✅ Personal evaluation

If you use this software in academic research, you must cite it:

@software{bharadwaj2026embodieddatakit,
  author = {Bharadwaj, Varun},
  title = {EmbodiedDataKit},
  year = {2026},
  url = {https://github.com/waycodes/embodied-agent}
}

See LICENSE and CITATION.cff for full terms.

About

A dataset compiler that converts heterogeneous robot trajectories (RLDS, LeRobot, Open X-Embodiment) into a streamable storage format with strict validation and sliceable metadata.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors