Search papers, labs, and topics across Lattice.
2
0
5
Achieve LLaMA-level reasoning accuracy with 44% lower latency and 73% lower API costs by strategically offloading work from large to small models only when needed.
LLM inference gets a 2x speed boost without training, thanks to a clever technique that merges retrieval with logit-based speculation.