Search papers, labs, and topics across Lattice.
3
0
6
0
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Sparse queries offer a surprisingly effective and efficient alternative to dense representations for image-to-3D generation, achieving comparable fidelity with less input-view bias.
Diffusion language models can now match autoregressive quality, thanks to a clever trick that forces them to agree with themselves.