Search papers, labs, and topics across Lattice.
2
0
4
0
Uniformly quantizing the entire diffusion action head of VLAs to W4A4 is not only possible, but can match or exceed FP16 performance, defying conventional wisdom and slashing memory footprint by 71%.
Quantizing ASR models can actually *improve* performance on rare words, without hurting overall accuracy, by strategically re-weighting the calibration data.