Search papers, labs, and topics across Lattice.
4
0
7
Get state-of-the-art spoken QA performance by adding lightweight speech modules to frozen VL models and training on synthetically generated speech data, sidestepping the need for massive multimodal datasets.
Muon's "one-size-fits-all" spectral update is holding back your models: Mousse adapts to curvature and cuts training time by 12%.
A 4B-parameter model, InternVL-U, punches above its weight, outperforming 14B-parameter models in multimodal generation and editing by using a novel data synthesis pipeline and architecture.
Monocular vision, combined with a physically-realistic simulator, enables a 70% speedup in robotic endoluminal surgery tasks.