Search papers, labs, and topics across Lattice.
4
0
5
57
LLMs can't rebuild software from scratch, even for widely used programs like FFmpeg and SQLite, revealing a critical gap in their ability to make high-level software architecture decisions.
Agentic coding gets a serious boost: distilling and reusing rollout trajectories lets Claude-4.5-Opus jump from 70.9% to 77.6% on SWE-Bench Verified.
LLMs can now automatically verify imperative code at scale, achieving state-of-the-art results on challenging verification benchmarks and paving the way for large-scale verified code datasets.
LLMs can now emulate debuggers, stepping through code and setting breakpoints, opening the door to more interactive and controllable neural program execution.