Search papers, labs, and topics across Lattice.
Corresponding author. Preprint
2
0
3
2
MXFP4 quantization just got a whole lot better: BATQuant recovers up to 96.43% of full-precision performance in LLMs and MLLMs, even under aggressive W4A4KV16 settings, by preventing outlier propagation across quantization blocks.
FreeAct boosts quantized LLM performance by dynamically adapting activation transformations to different token types, moving beyond the static transformations that limit existing methods.