Search papers, labs, and topics across Lattice.
2
0
4
4
Forget supervised fine-tuning: RL alone can unlock high-quality chain-of-thought reasoning in audio-language models, even starting from a model with no prior CoT capability.
Ditching VAE acoustic latents for semantic latents unlocks more semantically meaningful audio generation, outperforming traditional methods on AudioCaps.