Search papers, labs, and topics across Lattice.
3
0
8
6
Sparse updates in on-policy distillation can match full performance with significantly reduced training overhead, challenging conventional wisdom about dense parameter updates.
LLMs still struggle to go beyond simple lookups when answering questions about tables, especially when prediction and reasoning about unobserved data is required.
Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.