Search papers, labs, and topics across Lattice.
5
0
7
RLVR, the dominant training paradigm for audio language models, may be turning them into unfeeling "answering machines" that excel on benchmarks but fail the vibe check.
Mimicking human cognition, FLAIR lets dialogue models "think while listening," boosting performance without adding latency.
Turns out your always-on speech dialogue model is leaking speaker identity like a sieve, but a simple feature-domain anonymization technique can boost privacy by 3.5x with minimal impact on performance.
Ditch the training data: this intelligibility-guided approach fuses noisy and enhanced speech for robust ASR without needing a separate neural predictor.
Agent systems leveraging iterative tool orchestration and cross-modal analysis significantly outperform single models in audio reasoning, highlighting a promising path toward explainable audio intelligence.