Search papers, labs, and topics across Lattice.
CUHK MMLab
2
0
5
3
Decomposing GUI agent trajectories into verifiable milestones and auditing the evidence chain yields a 10% boost in RL training performance, outperforming single-judge reward systems.
VLMs can now self-evolve from *zero* data, thanks to a multi-agent RL framework that synthesizes its own visual concepts and reasoning tasks.