Search papers, labs, and topics across Lattice.
1
0
3
15
Forget supervised fine-tuning: RL alone can unlock high-quality chain-of-thought reasoning in audio-language models, even starting from a model with no prior CoT capability.