Search papers, labs, and topics across Lattice.
2
0
4
11
A unified assessment framework reveals hidden insights about agent performance, transforming how we evaluate AI systems.
VLMs confidently hallucinate answers even when presented with insufficient or contradictory multimodal evidence, highlighting a critical gap in their reliability.