Search papers, labs, and topics across Lattice.
3
0
4
LLMs can maintain long-context performance even with aggressive KV-cache eviction by learning to predict token importance and compressing evicted tokens into a latent memory.
Quantizing rollouts in LLM RL pipelines introduces a training-inference gap that QaRL closes, leading to +5.5 performance on math problems.
Achieve near-lossless 2-bit LLMs with a novel quantization-aware training scheme that progressively reduces precision and intelligently handles outlier channels.