Search papers, labs, and topics across Lattice.
Wangxuan Institute of Computer Technology, Peking University
1
0
3
11
MLLMs are better at understanding videos than directly grounding text queries within them, and a self-correction training loop can close the gap.