Search papers, labs, and topics across Lattice.
2
0
5
1
Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.
By decoupling MLLM instruction tuning from DiT alignment, DuoGen achieves state-of-the-art interleaved multimodal generation without costly unimodal pretraining.