Search papers, labs, and topics across Lattice.
3
0
5
64
LLMs can follow detailed code refactoring instructions, but still fall short of mimicking human refactoring choices in real-world codebases, highlighting a critical gap in their ability to autonomously improve code quality.
LLM benchmark translations can be dramatically improved by test-time compute scaling, revealing a surprisingly cheap way to get more reliable multilingual evaluations.
Context files like AGENTS.md, intended to guide coding agents, often *hurt* performance and increase costs, challenging the common practice of using them.