Search papers, labs, and topics across Lattice.
2
25
5
5
Forget ad-hoc VLA design: here are 12 key ingredients, validated in a unified framework, for building performant Vision-Language-Action models.
Achieve surprisingly strong multimodal understanding and generation with a simple connector between off-the-shelf LLMs and diffusion models, using only a fraction of the parameters of larger models.