Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (, Wuhan University
2
0
4
Adversarial attacks on ASR systems can achieve a +26.6 WER improvement by targeting feature representations instead of raw audio, exposing a significant blind spot in current robustness evaluations.
Finally, a single model generates realistic and coherent audio scenes from text, rivaling specialized models and even approaching real-world recordings.