Search papers, labs, and topics across Lattice.
2
0
4
2
Current MLLM agents struggle to find GUI defects, but a new benchmark and evaluator reveals the critical bottleneck is detection, and surprisingly, simply integrating the evaluator's verifiers significantly boosts performance without retraining.
Retrieval-augmented agents get a serious reasoning boost by explicitly evaluating their own retrieval quality at each step, leading to state-of-the-art performance on multi-hop question answering.