Search papers, labs, and topics across Lattice.
2
0
3
AI coding agents excel at translating scientific tasks into familiar formats but struggle to achieve true scientific discovery, with only 17.8% surpassing state-of-the-art benchmarks.
Enterprise agents struggle to achieve high performance in real-world tasks, with the best benchmark score only reaching 0.663, highlighting significant evaluation gaps.