Search papers, labs, and topics across Lattice.
4
0
7
19
LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.
AudioChat tackles the complexity of "audio stories" by using LLM-driven tool-calling agents to simulate user interactions, enabling audio foundation models to generate, edit, and understand complex multi-source acoustic scenes.
Seamlessly extend and morph audio clips using a diffusion model with masked latents and classifier-free guidance, achieving near-realism and opening new creative possibilities for sound design.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.