Search papers, labs, and topics across Lattice.
2
29
5
6
VLMs can now drive embodied agents to navigate complex environments with unprecedented efficiency, thanks to a novel framework that bridges the gap between 2D semantic understanding and 3D spatial reasoning.
GPT-4o now has open-source competition: Ming-Omni matches its modality support in a single, unified model capable of perception and generation across image, text, audio, and video.