Search papers, labs, and topics across Lattice.
2
0
5
Compressing 60-second audio into just 788 tokens, this new autoencoder makes generative audio modeling far more tractable by slashing encoding time and latent rates.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.