Search papers, labs, and topics across Lattice.
3
3
6
16
Training a single point cloud encoder across diverse 3D domains not only improves perception but also unlocks emergent behaviors and enhances robotic manipulation and spatial reasoning.
Spatial reasoning could be the secret sauce for building generalist embodied agents that can drive, manipulate objects, and fly drones, all within a single model.
CLIP's image tokens struggle to aggregate information from spatially or semantically related regions, but DeCLIP fixes this by decoupling self-attention and distilling knowledge from VFMs and diffusion models.