Search papers, labs, and topics across Lattice.
Achieve minute-level navigable video world models by combining the strengths of explicit 3D patch memory with implicit generative modeling.
MC3D models can now generalize to unseen camera configurations thanks to a new framework that explicitly accounts for spatial prior discrepancies.
Despite matching or exceeding human expert performance on generating potential diagnoses, current MLLMs struggle to synthesize multimodal clinical evidence for final diagnosis, revealing a critical gap in their clinical reasoning abilities.
Object hallucination in MLLMs can be significantly reduced by simply masking salient visual features during contrastive decoding.
MLLMs can now reason about road traffic accidents by fusing remote sensing imagery and structured data, unlocking interpretable insights previously inaccessible to traditional methods.