Search papers, labs, and topics across Lattice.
2
0
5
3
Forget ad-hoc VLA design: here are 12 key ingredients, validated in a unified framework, for building performant Vision-Language-Action models.
Unlock real-time 4D scene understanding from monocular video with a novel "encode-once, query-anywhere and anytime" framework that jointly models geometry and motion.