Search papers, labs, and topics across Lattice.
Central Media Technology Institute, Huawei Technologies Ltd.
2
0
5
MLLMs are better at understanding videos than directly grounding text queries within them, and a self-correction training loop can close the gap.
Open-source UniTalking rivals closed-source giants like Veo3 and Sora2 in talking-head video realism, thanks to its multi-modal transformer and pre-trained video priors.