Search papers, labs, and topics across Lattice.
2
0
4
4
Forget supervised fine-tuning: RL alone can unlock high-quality chain-of-thought reasoning in audio-language models, even starting from a model with no prior CoT capability.
Reinforcement learning with audio-text semantic rewards can overcome confirmation bias in test-time adaptation, leading to more robust ASR in noisy and accented speech environments.