Search papers, labs, and topics across Lattice.
3
0
7
3
MLLMs excel at single-hop tasks but falter dramatically in open-world scenarios, revealing critical gaps in their reasoning capabilities.
Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.
Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.