The Official Model Zoo for Tensorbit Labs.
This repository serves as a centralized library for pre-optimized neural network, large language model, and vision transformer binaries. Each model in this collection has been processed through the full Tensorbit P-D-Q pipeline (Pruning, Distillation, and Quantization) to ensure maximum performance on edge hardware without sacrificing reasoning capabilities.
Standard open-source models are often too "heavy" for on-device deployment. Tensorbit Labs specializes in transforming these heavy open-source models into lightweight, efficient versions suitable for on-device deployment. The system optimizes models through a specialized pipeline that combines pruning, distillation, and quantization to reduce size and latency while maintaining high accuracy.
- Memory Efficiency: Up to % reduction in VRAM footprint.
- Inference Speed: Optimized for tensorbit-run execution on NPU/ARM architectures.
- Verified Benchmarks: Every binary is benchmarked via tensorbit-bench to ensure accuracy parity with the original models.
Please reference performance_comparison.csv to view comparisons between raw PyTorch vs. Tensorbit stats for every model in the zoo.
These models are stored as .tb binaries designed to be loaded directly into the tensorbit-run engine.
# Example: Running a Tensorbit model locally
./tensorbit-run --model ./models/tb-llama-4-8b.tb --prompt "Explain quantum gravity."We focus on optimizing high-impact, open-weights models. If you would like to request a specific model optimization or contribute a "Tensorbit-ified" version of your own architecture, please open an issue or pull request with the label model-request.
The optimization weights (.tbm model files) are provided under the Apache License 2.0. Please refer to the original model creators (e.g., Meta, Mistral AI, etc.) for their underlying architectural licenses.
