Search papers, labs, and topics across Lattice.
4
0
8
7
Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.
Untangling multilayer networks just got easier: T-GINEE uses tensors to explicitly model cross-layer dependencies, outperforming methods that treat layers independently.
Ditch slow, token-by-token box generation: LocateAnything's Parallel Box Decoding (PBD) boosts VLM grounding speed and accuracy by decoding entire bounding boxes at once.
Current multimodal LLMs choke on long-form video understanding, either forgetting details or getting lost in the timeline, but a new agentic architecture with dynamic memory management offers a promising fix.