Search papers, labs, and topics across Lattice.
3
0
6
Get state-of-the-art spoken QA performance by adding lightweight speech modules to frozen VL models and training on synthetically generated speech data, sidestepping the need for massive multimodal datasets.
By explicitly modeling visibility, VSDiffusion generates more geometrically plausible and realistic shadows, outperforming prior methods on a challenging image composition task.
Forget trying to wrangle dynamic 4D scenes with recurrent networks – DynamicVGGT achieves state-of-the-art reconstruction accuracy using a surprisingly effective feed-forward approach.