Search papers, labs, and topics across Lattice.
1
0
3
By quantizing KV caches along their inner dimension, InnerQ achieves up to 22% speedup in LLM decoding compared to prior art, without sacrificing accuracy.