AucklandHKUMelbourneMonashWHUJun 9, 2026arXiv:2606.11260

RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark

Hongyu Jin, Siyi Wang, Yang Xiao, Jiaheng Dong, Shihong Tan, Kaiyuan peng, Georgiana Juravle, Shanquan Chen, Gongping Huang, Hong Jia, Eun-Jung Holden, James Bailey, Ting Dang

AI Summary

This paper introduces RAIL, a novel evaluation paradigm for large audio-language models (LALMs) that integrates cognitive principles from the Cattell-Horn-Carroll (CHC) framework to assess auditory intelligence. By formalizing auditory cognition into five core capabilities and developing structured tasks, the authors reveal significant gaps in current models' performance across these cognitive abilities. The evaluation of 26 state-of-the-art LALMs demonstrates that existing models show uneven capabilities in processing, retaining, and integrating auditory information, highlighting the need for a more nuanced assessment approach.

Key Contribution

Current LALMs exhibit significant performance disparities across cognitive auditory capabilities, revealing a critical oversight in existing evaluation methods.

Abstract

Humans process rich auditory environments through tightly integrated cognitive capabilities such as audio perception, audio reasoning, and memory. Despite recent progress in large audio-language models (LALMs) across speech understanding and multimodal audio reasoning, current evaluation paradigms remain largely task- or modality-centric, focusing on end performance while overlooking underlying auditory cognitive behaviours. This reveals a fundamental gap between how auditory cognition is understood in humans and how it is evaluated in LALMs, particularly in the lack of frameworks that operationalise cognitive principles beyond task-level metrics to systematically capture model behaviour. In this work, we introduce RAIL, a human-centric evaluation paradigm grounded in the Cattell-Horn-Carroll (CHC) cognitive framework. RAIL formalises auditory cognition into five core capabilities and develop them into structured evaluation tasks that probe how models process, retain, and integrate auditory information. We further construct a cognitively grounded benchmark with principled data curation and human-aligned evaluation protocols. Evaluating 26 state-of-the-art LALMs, we find that current models exhibit highly uneven performance across cognitive abilities. RAIL establishes a new evaluation paradigm that moves beyond task-centric benchmarking toward cognitively grounded assessment of auditory intelligence.

Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark

Related Papers