Search papers, labs, and topics across Lattice.
Nanjing University
2
0
4
Today's agents are surprisingly bad at real-world terminal tasks, with even frontier models failing nearly 40% of the time on everyday workflows.
Debugging complex code agents just got easier: CodeTracer reconstructs full state transition histories, pinpointing failure origins and enabling recovery of failed runs.