Search papers, labs, and topics across Lattice.
4
0
4
11
Audio-Language models are cheating on benchmarks, acing tests even when they barely listen.
Text-only LLMs already contain surprisingly diverse levels of auditory knowledge, and this pre-existing knowledge strongly predicts their performance when adapted for audio-language tasks.
LALMs struggle to handle multiple concurrent audio inputs, but a simple input permutation strategy can significantly boost their multi-audio understanding without retraining.
Overcome LALM's struggles with localized dialectal prosody: a new Taiwanese audio-text dataset and fine-tuning strategy boosts accuracy by 6.5% on the TAU Benchmark.