Search papers, labs, and topics across Lattice.
2
0
5
4
Despite impressive visual fidelity, today's generative models still stumble on basic physical, causal, and spatial reasoning tasks, revealing a "logical desert" beneath the surface.
Finally, a single model handles multi-modal video generation, inpainting, and editing at cinematic resolutions with synchronized audio, all while accepting diverse inputs like text, images, video clips, and audio references.