Search papers, labs, and topics across Lattice.
National Taiwan University
4
0
7
11
Speech quality assessment is skewed: male listeners consistently give higher scores than female listeners, and standard MOS models learn and perpetuate this bias.
Contrastive Decoding's power-up for audio language models hinges on fixing specific error types, like uncertainty and audio absence, but don't expect it to magically fix flawed reasoning.
Overcome LALM's struggles with localized dialectal prosody: a new Taiwanese audio-text dataset and fine-tuning strategy boosts accuracy by 6.5% on the TAU Benchmark.
Audio watermarks can now survive neural resynthesis, thanks to a latent space embedding technique that resists semantic compression by modern audio codecs.