Search papers, labs, and topics across Lattice.
1
0
3
12
Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.