Search papers, labs, and topics across Lattice.
3
0
5
3
Span-level error localization can boost deep-research agent reliability by up to 30 percentage points, revealing critical insights into where agents go wrong.
Debugging complex code agents just got easier: CodeTracer reconstructs full state transition histories, pinpointing failure origins and enabling recovery of failed runs.
Current MLLM benchmarks are missing the forest for the trees: Agentic-MME reveals that strong final-answer accuracy masks surprisingly poor tool use and planning in complex multimodal tasks.