Sonal Kumar

A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.

Sonal Kumar, Prem Seetharaman, Oriol Nieto +5

Data Curation & Synthetic Data Multimodal Models Speech & Audio

Mar 6, 2025

NVIDIAMar 6, 2025

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

A 3B parameter model, Audio Flamingo 2, now rivals larger proprietary models in audio understanding and reasoning, even handling audio segments up to 5 minutes long.

Sreyan Ghosh, Zhifeng Kong, Sonal Kumar +788

Multimodal Models Reasoning & Chain-of-Thought Speech & Audio

Search

Sonal Kumar

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)