Search papers, labs, and topics across Lattice.
1
0
3
Forget complex cross-modal alignment: this method uses visual prompting with instance IDs and reinforcement learning to achieve a 20.9% m_IoU improvement on spatial-temporal video grounding.