Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
6
0
7
CineDance-1M sets a new standard for open-source cinematic audio-video generation, boasting over 1 million high-quality, structured video samples that could transform the landscape of multimedia AI.
Achieving top-tier identity preservation in text-to-video generation without compromising on semantic fidelity, ST-DRC redefines the standards for high-quality video synthesis.
SAMOSA makes SAM-based tracking robust to complex motion and occlusions by explicitly modeling target dynamics and enforcing geometric and semantic consistency across frames.
Robots can now focus on the *right* body parts for interaction, thanks to a new vision-language model that understands human motion commands and precisely localizes task-relevant 3D keypoints.
Rethinking IRSTD as a centroid regression problem with single-point supervision achieves competitive detection performance with significantly reduced computational cost, challenging the dominance of pixel-level segmentation approaches.
Forget generic CoT: Embed-RL uses reinforcement learning to generate reasoning traces that are explicitly optimized for multimodal embedding tasks, leading to significant performance gains.