Search papers, labs, and topics across Lattice.
2
0
4
1
Ditch slow, token-by-token box generation: LocateAnything's Parallel Box Decoding (PBD) boosts VLM grounding speed and accuracy by decoding entire bounding boxes at once.
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.