Search papers, labs, and topics across Lattice.
4
0
5
2
Forget training wheels: this training-free method leverages uncertainty to guide vision-language models to the right image regions, boosting performance on detail-oriented tasks.
MLLMs can now efficiently process 10K-frame videos without training, by adaptively selecting tokens based on the model's own uncertainty about the content.
DROID-SLAM achieves robust real-time RGB SLAM in dynamic environments by explicitly modeling per-pixel uncertainty, outperforming existing methods that struggle with unknown dynamic objects and cluttered scenes.
Generate realistic egocentric videos with consistent 3D hand articulation, even with severe occlusions, by using sparse 3D hand joints as control signals.