Search papers, labs, and topics across Lattice.
2
0
4
6
Instead of training separate video diffusion models for each multimodal task, UniVidX learns a single model that handles diverse pixel-aligned video generation problems.
Zero-shot synthesis of articulated human-object interactions is now possible by treating diffusion-generated videos as supervision for 4D scene reconstruction, unlocking physically grounded interactions beyond rigid manipulation.