Search papers, labs, and topics across Lattice.
Achieve minute-level navigable video world models by combining the strengths of explicit 3D patch memory with implicit generative modeling.
MC3D models can now generalize to unseen camera configurations thanks to a new framework that explicitly accounts for spatial prior discrepancies.
A UNet trained on a massive custom dataset achieves a significant leap in 3D cone localization accuracy, paving the way for more robust autonomous racing systems.
Stop training your M3OD models on the same old entangled data: this method decomposes and recomposes objects, scenes, and camera poses to generate diverse training examples on the fly, boosting performance without needing more real-world data.
Soft pseudo-labels, theoretically equivalent to hard labels when perfectly calibrated, tank performance in cross-domain semantic segmentation, motivating a new calibration framework.
An end-to-end system extracts funny scenes from movies with 87% accuracy, opening new avenues for automated content repurposing.
Object hallucination in MLLMs can be significantly reduced by simply masking salient visual features during contrastive decoding.
MLLMs can now reason about road traffic accidents by fusing remote sensing imagery and structured data, unlocking interpretable insights previously inaccessible to traditional methods.