Search papers, labs, and topics across Lattice.
3
0
6
0
LLMs can be aggressively quantized to W(1+1)A4 without significant performance degradation using a surprisingly simple three-stage distillation approach.
Continual learning methods for Video-LLMs face a fundamental trade-off: mitigating catastrophic forgetting often comes at the cost of generalization or prohibitive computational overhead.
Quantizing large vision-language models just got a whole lot better: a new token-level sensitivity metric closes the accuracy gap with full-precision models by up to 1.6% in 3-bit weight-only quantization.