Search papers, labs, and topics across Lattice.
Sun Yat-sen University
5
0
8
1
CoEvoer achieves unprecedented accuracy in upper-body pose estimation by leveraging cross-dependencies among facial, hand, and torso features.
By reasoning about intentions across time, CiT enables more accurate trajectory prediction, especially when considering the ego-agent's own motion.
Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.
Achieve human-like full-duplex voice interactions with SoulX-Duplug, a plug-and-play module that slashes latency and improves turn management by acting as a semantic VAD.
Forget task-specific architectures: a single Vision-Language-Action foundation model, ABot-N0, now dominates embodied navigation across five distinct tasks.