Search papers, labs, and topics across Lattice.
The University of Melbourne
1
0
3
Current LLM benchmarks hide critical reasoning failures in long, multimodal documents, which BRIDGE exposes through step-level evaluation.