Search papers, labs, and topics across Lattice.
3
0
6
3
Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.
Get up to 1.79x faster ViT inference on high-resolution images without sacrificing accuracy by surgically replacing full-attention blocks with cheaper alternatives *after* pre-training.
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.