Search papers, labs, and topics across Lattice.
2
1
3
17
GPTQ's quantization of LLMs is leaving performance on the table: WaterSIC closes the gap with an information-theoretically near-optimal approach that beats the state-of-the-art on Llama and Qwen.
Waterfilling-inspired quantization ("WaterSIC") slashes the quantization error in LLMs by intelligently allocating bits based on weight covariance, outperforming standard techniques like GPTQ.