Search papers, labs, and topics across Lattice.
3
0
5
12
Ditch the clunky architectures: a single diffusion model can now handle vision, language, and robot control to achieve SOTA manipulation performance.
A practical VLA model, LLaVA-VLA, achieves strong generalization and versatility on a new benchmark, CEBench, while running on consumer-grade GPUs, eliminating the need for costly pre-training.
By aligning latent representations with multiple visual foundation models, FRAPPE offers a more scalable and data-efficient way to imbue generalist robotic policies with robust world-awareness.