binbabou

xiaobinba binbabou

Sparse people do sparse things.

Achievements

fastertransformer_backend fastertransformer_backend Public

Forked from triton-inference-server/fastertransformer_backend

Python
onnxruntime onnxruntime Public

Forked from microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 77.9k 16k
huggingface/text-embeddings-inference huggingface/text-embeddings-inference Public

A blazing fast inference solution for text embeddings models

Rust 4.7k 384
vllm-project/llm-compressor vllm-project/llm-compressor Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3.1k 490