Search papers, labs, and topics across Lattice.
5
21
6
22
Image diffusion models get a boost by letting their semantic feature space evolve during training, leading to faster convergence and better image quality.
MLLMs can get a surprising visual reasoning boost from a simple trick: adding just a dash of visually grounded self-supervision to instruction tuning.
By predicting future scene structure in the feature space of a frozen vision foundation model before rendering pixels, Re2Pix achieves state-of-the-art video prediction with improved temporal consistency and perceptual quality.
Franca leapfrogs proprietary vision models like DINOv2 and CLIP, proving open-source can win on performance and transparency in visual representation learning.
Video pre-training can drive autonomous vehicles, but scaling model size doesn't always guarantee safer closed-loop driving.