We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.
You must be logged in to block users.
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Python 2.3k 195
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Python 879 120
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python 1.1k 86
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Python 280 24
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
Python 130 17
The ultimate Rubik's Cube solving algorithm for high-speed axial robots.
C++ 144 12
There was an error while loading. Please reload this page.