Search papers, labs, and topics across Lattice.
4
3
7
15
General-purpose vision-language models stumble when looking down from drones, but a new benchmark reveals that multi-task learning can substantially improve their aerial reasoning skills.
Ditch the optimization: MoRe achieves real-time 4D scene reconstruction from monocular video using a feedforward transformer that disentangles motion and structure.
By disentangling structure and motion in the latent space, CoWVLA achieves superior visuomotor learning compared to standard world-model and latent-action approaches.
VLMs can now excel at industrial anomaly detection by injecting domain-specific facts and aligning with expert preferences, achieving state-of-the-art zero-shot and one-shot performance.