Search papers, labs, and topics across Lattice.
5
0
7
8
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Diffusion language models can now match autoregressive quality, thanks to a clever trick that forces them to agree with themselves.
Verifier-free evolution can now match or exceed the performance of verifier-based methods, while slashing API costs by 3x and boosting throughput by 10x, thanks to a clever model orchestration strategy.
Forget SVD: CARE aligns low-rank attention approximations with input activations, boosting accuracy up to 1.7x and slashing perplexity by 215x when converting models to multi-head latent attention.
Models are substantially better at pairwise self-verification than independent scoring, unlocking a more efficient and accurate approach to test-time scaling for complex reasoning.