Search papers, labs, and topics across Lattice.
National Taiwan University
7
0
8
11
LALMs reveal their hidden biases when you let them generate freely from real human voices, and gender stereotypes are more pronounced than accent biases.
Speech-to-speech translation can now convey laughter and tears with human-like fidelity, thanks to a surprisingly data-efficient approach leveraging LoRA experts.
Text-only LLMs already contain surprisingly diverse levels of auditory knowledge, and this pre-existing knowledge strongly predicts their performance when adapted for audio-language tasks.
Speech quality assessment is skewed: male listeners consistently give higher scores than female listeners, and standard MOS models learn and perpetuate this bias.
Contrastive Decoding's power-up for audio language models hinges on fixing specific error types, like uncertainty and audio absence, but don't expect it to magically fix flawed reasoning.
Audio watermarks can now survive neural resynthesis, thanks to a latent space embedding technique that resists semantic compression by modern audio codecs.
Overcome LALM's struggles with localized dialectal prosody: a new Taiwanese audio-text dataset and fine-tuning strategy boosts accuracy by 6.5% on the TAU Benchmark.