Ruohan Gao

Papers on Lattice

Total citations

Topics

h-index

Research focus

Multimodal Models (2)Speech & Audio (2)Interpretability & Mechanistic Interp (1)Computer Vision (1)

Frequent co-authors

Ramaneswaran Selvakumar (1)Kaousheik Jayakumar (1)S. Sakshi (1)Sreyan Ghosh (1)

Papers (2)

Apr 3, 2026

Ramaneswaran Selvakumar +5Apr 3, 2026

Do Audio-Visual Large Language Models Really See and Hear?

AVLLMs may "hear" at intermediate layers, but they largely ignore audio cues in favor of vision when generating text, revealing a fundamental modality bias.

Ramaneswaran Selvakumar, Kaousheik Jayakumar, S. Sakshi +3

Interpretability & Mechanistic Interp Multimodal Models Speech & Audio

Mar 30, 2026

Derong Jin +5Mar 30, 2026

SonoWorld: From One Image to a 3D Audio-Visual Scene

Now you can turn a single image into a navigable 3D world complete with spatial audio, opening the door to richer immersive experiences.

Derong Jin, Derong Jin, Xiyi Chen +3

Computer Vision Multimodal Models Speech & Audio

Search

Ruohan Gao

Research focus

Frequent co-authors

Papers (2)