Search papers, labs, and topics across Lattice.
Achieve seamless vector map generation across all land-cover classes from aerial imagery by enforcing shared-edge consistency, outperforming class-specific methods.
Answering at the wrong time can be as bad as answering incorrectly in streaming video, so this work introduces a new framework that learns when to answer based on the availability of supporting visual evidence.
Forget direct prompt editing: this agentic planning framework, powered by offline RL and synthetic data, masters complex image styling by breaking it down into interpretable tool sequences.
Ditch mean pooling in your geospatial foundation models: richer pooling methods like GeM can boost accuracy by up to 5% and slash the geographic generalization gap by 40%.
LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.
Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.
Achieve up to 57% better point cloud compression by combining the generalization of pretrained models with the robustness of implicit neural representations.
VisRAG models can now handle real-world image degradations like blur and shadows without sacrificing accuracy, thanks to a new causality-guided architecture that disentangles semantics from visual distortions.
ImageNet-pretrained CNNs can spot looted archaeological sites from space with surprising accuracy, leaving traditional methods in the dust.
Ditch the tracker: HiMAP offers a robust, tracking-free trajectory prediction fallback that actually rivals tracking-based methods in autonomous driving scenarios.
Diffusion models can now efficiently tackle rare event sampling in molecular dynamics, unlocking rapid calculation of folding free energies in minutes to hours on a GPU.
Humanoid robots can now learn complex, terrain-aware motions directly from video using a low-cost pipeline, eliminating the need for expensive MoCap data and manual motion design.
By predicting tracking models rather than image features, GOT-JEPA unlocks more robust object tracking, even when objects are heavily occluded or the environment is dynamic.
LLMs and VLLMs can team up to generate synthetic image data so good, it beats state-of-the-art methods and boosts performance on rare classes and open-vocabulary object detection.