Search papers, labs, and topics across Lattice.
2
10
4
9
Uncertainty-driven dynamic compute allocation lets web agents outperform naive test-time scaling by 9.1% while using 2.3x fewer tokens.
Forget sparse KV caches – QuantSpec's hierarchical 4-bit quantization unlocks 2.5x speedups in long-context LLM inference with >90% acceptance rates.