Search papers, labs, and topics across Lattice.
7
0
6
3
Latent reasoning can beat explicit Chain-of-Thought – but only if you force it to learn causal dynamics via a visual world model, not just language.
Endowing VLMs with intrinsic 3D geometric awareness and physical interaction cues via XEmbodied substantially boosts performance on spatial reasoning and embodied tasks, surpassing existing 2D image-text pretrained models.
Image-goal navigation gets a boost from hierarchical reasoning, using vision-language models for high-level planning and online RL for low-level execution, significantly reducing wandering and improving success in complex environments.
Achieve state-of-the-art hyperspectral image denoising by adaptively balancing data fidelity and noise priors, outperforming existing methods that overemphasize image priors.
Achieve high-fidelity image editing without sacrificing source fidelity by straightening the latent trajectory and adaptively blending source and target velocities.
Autonomous driving models no longer need to compromise between spatial perception and semantic reasoning: UniDriveVLA's expert decoupling unlocks state-of-the-art performance across a range of driving tasks.
Ditch language descriptions: this new driving model leverages dense 3D geometry for superior autonomous driving performance and cross-camera generalization.