Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology
1
0
3
5
Decomposing GUI agent trajectories into verifiable milestones and auditing the evidence chain yields a 10% boost in RL training performance, outperforming single-judge reward systems.