Search papers, labs, and topics across Lattice.
5
0
7
2
Overconfident tokens, often missed by entropy-based methods, carry surprisingly dense corrective signals in on-policy distillation, allowing for near-baseline performance with <10% of tokens.
Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.
Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.
Reasoning models aren't just verbose, they're actively *harmed* by their own verbosity, but a simple self-distillation trick can compress their outputs by up to 59% while boosting accuracy by up to 16 points.
Overconfident errors in RLVR monopolize probability mass and suppress exploration, but a confidence-aware penalty fixes this and boosts mathematical reasoning performance.