Search papers, labs, and topics across Lattice.
3
88
6
15
LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.
A 3B parameter model, Audio Flamingo 2, now rivals larger proprietary models in audio understanding and reasoning, even handling audio segments up to 5 minutes long.