Search papers, labs, and topics across Lattice.
5
0
8
15
Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.
LongLive-RAG transforms long video generation by enabling the use of a searchable memory of past latents, drastically reducing error accumulation.
Real-time, high-resolution video editing is now possible on a single consumer GPU, thanks to a novel hybrid diffusion transformer and system-level optimizations that achieve 24 FPS at 1280x704.
Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.
By structuring diffusion-based driving models around a "scaffold" of frozen structural tokens, Fast-dDrive achieves a 12x speedup over autoregressive baselines while improving trajectory accuracy.