Search papers, labs, and topics across Lattice.
2
0
4
AudioChat tackles the complexity of "audio stories" by using LLM-driven tool-calling agents to simulate user interactions, enabling audio foundation models to generate, edit, and understand complex multi-source acoustic scenes.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.