Search papers, labs, and topics across Lattice.
1
0
3
LLMs can nail the final answer in code execution but still fail to reason about the steps to get there, exposing a critical flaw in current evaluation methods.