Search papers, labs, and topics across Lattice.
3
0
5
AudioChat tackles the complexity of "audio stories" by using LLM-driven tool-calling agents to simulate user interactions, enabling audio foundation models to generate, edit, and understand complex multi-source acoustic scenes.
Seamlessly extend and morph audio clips using a diffusion model with masked latents and classifier-free guidance, achieving near-realism and opening new creative possibilities for sound design.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.