Search papers, labs, and topics across Lattice.
4
0
7
0
Image-size agnostic vision transformers are now a practical reality, thanks to a new self-supervised pretraining method that maintains constant computational cost regardless of input resolution.
Robots get a crucial boost in robustness by learning to "see" and predict how objects will move, not just react to the current frame.
Teaching robots to manipulate objects just got easier: OCRA learns directly from human demonstration videos by focusing on object interactions and incorporating tactile feedback.
MLLMs can "hear" a little, but EgoSound reveals they're still largely deaf to the nuances of sound in egocentric video, especially when it comes to spatial and causal reasoning.