Search papers, labs, and topics across Lattice.
3
0
6
9
Today's best web agents are shockingly inefficient, achieving only 1.15% trajectory efficiency on realistic long-horizon tasks, revealing a critical need to move beyond simple success rates.
Agentic coding gets a serious boost: distilling and reusing rollout trajectories lets Claude-4.5-Opus jump from 70.9% to 77.6% on SWE-Bench Verified.
Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.