Search papers, labs, and topics across Lattice.
1
0
2
Edge devices can achieve a 93% reduction in time-to-first-token for local LLM inference by cooperatively caching and sharing intermediate processing states.