Search papers, labs, and topics across Lattice.
4
4
7
4
Forget short-horizon RL: Odysseus proves VLMs can master 100+ turn decision-making in complex games, outperforming state-of-the-art models by 3x.
Pokemon, not just a childhood game, emerges as a surprisingly effective benchmark for AI, revealing critical gaps in LLMs and RL agents that existing benchmarks miss.
Automating RL environment engineering slashes costs and unlocks massive speedups (up to 22,320x!) using a recipe of prompt engineering, verification, and agent-assisted repair.
Multimodal agents still struggle with game development, solving only ~50% of tasks in a new benchmark, GameDevBench, highlighting the need for better multimodal reasoning in complex software environments.