Search papers, labs, and topics across Lattice.
3
0
6
6
Text-only LLMs already contain surprisingly diverse levels of auditory knowledge, and this pre-existing knowledge strongly predicts their performance when adapted for audio-language tasks.
Speech quality assessment is skewed: male listeners consistently give higher scores than female listeners, and standard MOS models learn and perpetuate this bias.
Overcome LALM's struggles with localized dialectal prosody: a new Taiwanese audio-text dataset and fine-tuning strategy boosts accuracy by 6.5% on the TAU Benchmark.