Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
3
0
4
Continuous-target modeling reveals a shared semantic mapping for ASR and S2TT, challenging conventional views on their error sources.
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
Real-time audio interaction is now possible with a unified model that not only performs traditional tasks but also proactively responds to audio stimuli.