We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.
You must be logged in to block users.
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 84.3k 18.5k
A framework for efficient model inference with omni-modality models
Python 5.3k 1.2k
An optimized Merkle Patricia Trie implementation on GPU, fully compatible with and integrable into Ethereum. The paper is published on VLDB 2024.
Cuda 14 1
Source code of our DaMoN@SIGMOD 2024 paper "How Does Software Prefetching Work on GPU Query Processing?"
Cuda 9
A cross-modal vector index with fast construction on heterogeneous CPU-GPU environment. Published on DaMoN@SIGMOD 2025.
Cuda 16
Large Brain Models with Test Time Training
Python 38
There was an error while loading. Please reload this page.