Siddharth Gururani

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (3)Open-Source Models & Weights (1)Speech & Audio (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Ramani Duraiswami (2)Mohammad Shoeybi (2)Sreyan Ghosh (1)Arushi Goel (1)

Papers (3)

Apr 13, 2026

NVIDIA2w ago·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Mar 14, 2026

NVIDIAMar 14, 2026·also UMD

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Current multimodal models are surprisingly bad at understanding long, complex videos, struggling to integrate audio, visual, and text cues even for basic reasoning tasks.

Vatsal Agarwal, Katie Lyons, James Case +6

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Oct 28, 2025

NVIDIAOct 28, 2025·also BUPT, Cohere, Georgia Tech, KAIST +5

World Simulation with Video Foundation Models for Physical AI

Forget synthetic data that looks like it came from a PS2 game: NVIDIA's new Cosmos-Predict2.5 generates high-fidelity videos for training embodied AI, opening the door to more realistic and reliable simulations.

Nvidia Arslan Ali, Junjie Bai, Maciej Bala +8536

Multimodal Models Robotics & Embodied AI World Models & Planning

Search

Siddharth Gururani

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)