Search papers, labs, and topics across Lattice.
Independent Researcher
4
0
4
LLM agents can now achieve a 92% success rate in complex code repository setup by learning from past failures, a 19% improvement over existing methods.
Today's agents are surprisingly bad at real-world terminal tasks, with even frontier models failing nearly 40% of the time on everyday workflows.
Current LLMs struggle to leverage software documentation for repository-level comprehension, but high-quality documentation can boost agent performance by 20%.
LLMs struggle to build complete software from scratch, with even the best models failing more than half the time on a new CLI tool generation benchmark.