Search papers, labs, and topics across Lattice.
5
4
7
5
Forget paired video-music training data: V2M-Zero aligns video and music by matching the *timing* of changes within each modality, not the content itself.
Compressing 60-second audio into just 788 tokens, this new autoencoder makes generative audio modeling far more tractable by slashing encoding time and latent rates.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.
Generate entire multi-instrumental tracks in one pass with Stemphonic, a new diffusion/flow model that's 25-50% faster and higher quality than existing stem generation methods.
Forget RLHF and DPO – DRAGON lets you fine-tune generative models with rewards that compare entire *distributions* of outputs, unlocking better control and quality without human preference data.