Search papers, labs, and topics across Lattice.
2
0
5
Uniformly quantizing the entire diffusion action head of VLAs to W4A4 is not only possible, but can match or exceed FP16 performance, defying conventional wisdom and slashing memory footprint by 71%.
Static rankings of attention heads for local/global behavior become unreliable after hybridizing attention mechanisms in LLMs, necessitating adaptive selection methods like BOSCH.