Search papers, labs, and topics across Lattice.
X-LANCE Lab, Shanghai Jiao Tong University, China, Shanghai Innovation Institute, China
2
0
4
5
Unlock SOTA audio understanding by jointly training on readily available clip-level descriptions and scarce frame-level annotations, bridging the gap between global semantics and local details.
Online reinforcement learning with large audio language model rewards catapults text-to-audio generation to a new state-of-the-art, even with a relatively small 470M parameter model.