Search papers, labs, and topics across Lattice.
OPPO
1
0
2
GUI agents learn faster and generalize better with a new reward shaping technique that dynamically adapts to successful exploration trajectories, outperforming fixed reward schemes.