Search papers, labs, and topics across Lattice.
Beijing Jiaotong University
1
0
2
2
Current MLLM agents struggle to find GUI defects, but a new benchmark and evaluator reveals the critical bottleneck is detection, and surprisingly, simply integrating the evaluator's verifiers significantly boosts performance without retraining.