Search papers, labs, and topics across Lattice.
3
0
5
0
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Diffusion language models can now match autoregressive quality, thanks to a clever trick that forces them to agree with themselves.
Forget SVD: CARE aligns low-rank attention approximations with input activations, boosting accuracy up to 1.7x and slashing perplexity by 215x when converting models to multi-head latent attention.