Search papers, labs, and topics across Lattice.
3
0
5
Robots can now focus on the *right* body parts for interaction, thanks to a new vision-language model that understands human motion commands and precisely localizes task-relevant 3D keypoints.
Rethinking IRSTD as a centroid regression problem with single-point supervision achieves competitive detection performance with significantly reduced computational cost, challenging the dominance of pixel-level segmentation approaches.
Forget generic CoT: Embed-RL uses reinforcement learning to generate reasoning traces that are explicitly optimized for multimodal embedding tasks, leading to significant performance gains.