Search papers, labs, and topics across Lattice.
9
0
10
4
MLLMs excel at single-hop tasks but falter dramatically in open-world scenarios, revealing critical gaps in their reasoning capabilities.
Skill0.5 achieves state-of-the-art out-of-distribution generalization in agentic RL by intelligently combining skill internalization and utilization, outperforming methods that rely solely on one or the other.
GUI agents can learn world knowledge more efficiently by internalizing causal relationships during mid-training, rather than relying on implicit learning through action annotations or reward signals in post-training.
Current LLM agents still struggle to infer and leverage user preferences from fragmented, real-world interactions, revealing a substantial gap between their capabilities and the demands of personalized decision-making.
LLM agents trained with simulated user and tool noise not only become more robust in messy real-world environments, but also surprisingly improve on clean, idealized benchmarks.
Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.
Agent-as-a-Judge can outperform LLM-as-a-Judge in complex environments, but still struggles to reliably verify agent behavior, revealing a critical gap in current LLM-based agent evaluation.
LLM agents can internalize skills via in-context RL, achieving zero-shot autonomous behavior without the token overhead and retrieval noise of traditional methods.
Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.