Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
7
0
6
Continuous-target modeling reveals a shared semantic mapping for ASR and S2TT, challenging conventional views on their error sources.
HoliDubber revolutionizes video dubbing by seamlessly integrating speech and sound effects from a single text prompt, outperforming traditional methods in quality and synchronization.
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
Strong translation quality doesn't guarantee high speech or temporal fidelity, revealing critical gaps in existing evaluation practices for speech translation systems.
LLMs can transform ambiguous spoken signals into seamless user interactions by diagnosing the *cause* of ASR errors (perception, comprehension, deletion) and proactively requesting targeted clarification.
ASR systems can now be more trustworthy: this work shows how to train them to abstain from transcribing uncertain segments, leading to more reliable outputs.
Reasoning across languages doesn't have to break the bank: a new framework slashes token costs by over 50% while maintaining accuracy, especially boosting performance in low-resource languages.