Search papers, labs, and topics across Lattice.
5
3
11
6
Smaller LLMs can learn to predict when they'll fail, paving the way for efficient "ask for help" systems that rival the performance of much larger models.
Forget expensive long-context pre-training: knowledge distillation can transfer long-context retrieval skills to smaller models using only short-context data.
Forget agents and world models – the future of computing could be learned directly from I/O traces, turning the model itself into the computer.
Scale up offline policy training for diffusion LLMs without breaking the bank: dTRPO slashes trajectory computation costs while boosting performance up to 9.6% on STEM tasks.
Forget scaling laws: this work shows you can get SOTA reasoning from sub-billion parameter models with *less* data, if you're smart about curation and resampling.