Search papers, labs, and topics across Lattice.
7
1
9
20
GRAIL achieves an impressive 84% success rate in real-world object pick-up tasks using only synthetic data, revolutionizing humanoid robot training.
Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.
Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.
Ditch slow, token-by-token box generation: LocateAnything's Parallel Box Decoding (PBD) boosts VLM grounding speed and accuracy by decoding entire bounding boxes at once.
Forget everything you thought you knew about linear attention: decoupling erase and write operations unlocks significantly better long-context retrieval.
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.
Forget painstakingly engineering robot behaviors: DreamZero learns directly from video of other robots or even humans, adapting to new tasks and bodies with just minutes of data.