Dinesh Manocha

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Speech & Audio (5)Multimodal Models (4)Eval Frameworks & Benchmarks (2)Interpretability & Mechanistic Interp (2)

Frequent co-authors

Sreyan Ghosh (4)Kaousheik Jayakumar (4)Nishit Anand (3)Ramani Duraiswami (3)

Papers (7)

Jul 17, 2026

NVIDIA1w ago·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio-Visual Flamingo: Open Audio-Visual Intelligence for Long and Complex Videos

AV-Flamingo outperforms existing models on complex audio-visual tasks, revealing that size isn't everything when it comes to reasoning capabilities.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +21

Multimodal Models Speech & Audio

Jun 16, 2026

A Closer Look at Failure Modes in Temporal Understanding of Large Audio-Language Models

LALMs can boost their temporal reasoning accuracy by 3.2% simply by better redistributing attention across audio tokens rather than relying on textual cues.

Apoorva Kulkarni, Kaousheik Jayakumar, Sreyan Ghosh +3

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Speech & Audio

Jun 16, 2026

VEGA: Learning Navigation VLAs from In-the-Wild Egocentric Video with Geometric Trajectory Supervision

Video-derived geometric supervision enables a 33% reduction in collisions and a 150% increase in navigation success for VLAs, reshaping the landscape of obstacle-aware navigation.

Gershom Seneviratne, Yohan Abeysinghe, Jianyu An +2

Multimodal Models Robotics & Embodied AI

Apr 13, 2026

NVIDIAApr 13, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Apr 9, 2026

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

Steering vectors work primarily by nudging the output value (OV) circuit in attention, not by re-weighting attention scores, and can be drastically sparsified without losing effectiveness.

Stephen Cheng, Stephen Cheng, Sarah Wiegreffe +1

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Apr 3, 2026

Ramaneswaran Selvakumar +5Apr 3, 2026

Do Audio-Visual Large Language Models Really See and Hear?

AVLLMs may "hear" at intermediate layers, but they largely ignore audio cues in favor of vision when generating text, revealing a fundamental modality bias.

Ramaneswaran Selvakumar, Kaousheik Jayakumar, S. Sakshi +3

Interpretability & Mechanistic Interp Multimodal Models Speech & Audio

Mar 31, 2026

Mar 31, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models

LALMs can be easily tricked into "hearing" things that aren't there, with success rates as high as 95% on targeted attacks.

Ashish Seth, Sonal Kumar, Ramaneswaran Selvakumar +5

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Speech & Audio

Search

Dinesh Manocha

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)