Search papers, labs, and topics across Lattice.
3
0
6
27
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
Overcome the scarcity of 4D training data by cleverly borrowing spatial understanding from 3D models and temporal dynamics from video models.
Achieve SOTA joint audio-video generation with JavisDiT++ using just 1M public training examples, rivaling performance of models trained on proprietary datasets.