Search papers, labs, and topics across Lattice.
2
0
5
1
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
Overcome the scarcity of 4D training data by cleverly borrowing spatial understanding from 3D models and temporal dynamics from video models.