Nishit Anand

University of Maryland College Park, Indian Institute of Technology Delhi, Indraprastha Institute of Information Technology Delhi, Jaypee Institute of Information Technology Noida

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Speech & Audio (3)Multimodal Models (2)Open-Source Models & Weights (2)Computer Vision (1)

Frequent co-authors

R. Duraiswami (4)Dinesh Manocha (3)Sreyan Ghosh (2)Arushi Goel (2)

Papers (4)

Jul 17, 2026

NVIDIA1w ago·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio-Visual Flamingo: Open Audio-Visual Intelligence for Long and Complex Videos

AV-Flamingo outperforms existing models on complex audio-visual tasks, revealing that size isn't everything when it comes to reasoning capabilities.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +21

Multimodal Models Speech & Audio

Apr 27, 2026

Apr 27, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information

Learning Illumination Control in Diffusion Models

Open-source diffusion models can now achieve state-of-the-art illumination control rivaling closed-source alternatives, thanks to a novel training pipeline and dataset.

Nishit Anand, Manan Suri, Christopher Metzler +2

Computer Vision Data Curation & Synthetic Data Open-Source Models & Weights

Apr 13, 2026

NVIDIAApr 13, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Mar 31, 2026

Mar 31, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models

LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.

Ashish Seth, Sonal Kumar, Ramaneswaran Selvakumar +5

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Speech & Audio

Search

Nishit Anand

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)