Search papers, labs, and topics across Lattice.
2
0
4
0
Achieve significantly more realistic and lip-synced movie dubbing by modeling the cognitive processes of professional actors with a novel diffusion transformer architecture.
By selectively attending to question-relevant information across video frames and memory, QViC-MF achieves state-of-the-art results in long-term video understanding, highlighting the importance of feedback-driven perception.