Search papers, labs, and topics across Lattice.
Soochow University
1
0
3
Text-to-video retrieval models struggle to distinguish videos that differ only in their final state, revealing a critical gap in temporal reasoning and end-state grounding.