Search papers, labs, and topics across Lattice.
3
0
6
VLA models struggle not with *what* they see, but with *how* they see it: FocusVLA reveals that directing attention to task-relevant visual regions unlocks significant performance gains in robotic manipulation.
By cleverly using readily available video segmentation masks, this method boosts DINOv2's point tracking performance by over 14% – a surprisingly effective way to inject temporal awareness into static image-pretrained models.
Achieve state-of-the-art semi-supervised crowd instance segmentation and counting by generating high-quality mask supervision from sparse annotations, effectively bridging the gap between these two tasks.