Search papers, labs, and topics across Lattice.
1
0
3
Achieve FP16-level LLM accuracy at 3-bit quantization, unlocking 1.5x faster inference than 4-bit methods on consumer GPUs.