Yuxiang Zhao

Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.

Yuxiang Zhao, Yuxiang Zhao, Tianrui Wang +5

Inference & Quantization Speech & Audio

Mar 16, 2026

Mar 16, 2026·also SJTU, Soul AI Lab, SYSU

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Achieve human-like full-duplex voice interactions with SoulX-Duplug, a plug-and-play module that slashes latency and improves turn management by acting as a semantic VAD.

Ruiqi Yan, Wenxi Chen, Zhanxun Liu +18

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Feb 12, 2026

Feb 12, 2026·also Baidu, HFUT, Huawei, SYSU

ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation

Forget task-specific architectures: a single Vision-Language-Action foundation model, ABot-N0, now dominates embodied navigation across five distinct tasks.

Zedong Chu, Xiaolong Wu, Yanfen Shen +37

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Search

Yuxiang Zhao

Research focus

Frequent co-authors

Papers (5)