Shinji Watanabe

Dixtral achieves up to 29% absolute improvement in speaker-attributed transcription accuracy by leveraging diarization masks without risking catastrophic forgetting.

Alexander Polok, Samuele Cornell, Sathvik Udupa +3

Multimodal Models Speech & Audio

Jun 11, 2026

CMU ML1w ago·also Brno University of Technology

Endpoint Anticipation for Low-Latency Spoken Dialogue

Anticipating dialogue endpoints up to 2.56 seconds ahead can slash latency by over half while enhancing computational efficiency in real-time speech interactions.

Sathvik Udupa, Shinji Watanabe, Petr Schwarz +2

Speech & Audio

Jun 9, 2026

CMU ML1w ago·also Brno University of Technology, JHU, Kyoto, Sheffield +1

CS-YODAS: A Mined Dataset of In-the-Wild Code-Switched Speech

A groundbreaking dataset of 313 hours of real-world code-switched speech reveals rich patterns and frequencies previously overlooked in multilingual research.

Brian Yan, Qingzheng Wang, Matthew Wiesner +9

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Jun 8, 2026

CMU ML1w ago·also USC

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

Incremental speech quality assessment can be dramatically improved by modeling it as a multi-resolution task, achieving a 48% reduction in error on partial audio inputs.

Zhuoyan Tao, Jiatong Shi, Hye-jin Shim +1

Speech & Audio

Search

Shinji Watanabe

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)