Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University, Hunyuan Team, Tencent
2
0
4
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
LLMs can transform ambiguous spoken signals into seamless user interactions by diagnosing the *cause* of ASR errors (perception, comprehension, deletion) and proactively requesting targeted clarification.