Search papers, labs, and topics across Lattice.
2
0
5
0
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Diffusion language models can now match autoregressive quality, thanks to a clever trick that forces them to agree with themselves.