Search papers, labs, and topics across Lattice.
5
0
9
2
Forget noisy pseudo-labels: SpatialEvo unlocks self-supervised 3D spatial reasoning by generating perfectly accurate training data directly from scene geometry.
Multimodal LLMs still struggle to faithfully recreate webpages from videos, particularly in capturing fine-grained style and motion, despite advances in other areas.
Forget fuzzy language – CoCo uses executable code as Chain-of-Thought to generate images with unprecedented control and precision, blowing away existing methods on complex scenes.
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
Real-time AI companions can now proactively interact with users thanks to Proact-VL, a framework that balances response latency, content quality, and video understanding.