Mar 4, 2026arXiv:2603.03855

A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs

Taehan Lee, Jae-Hyung Jung, Jaehan Jung, Hyukjun Lee

AI Summary

This paper introduces a large-scale sensitivity analysis of multi-event audio grounding in Audio LLMs, evaluating their performance on complex acoustic scenes using 71K AudioCapsV2 clips. The study constructs present-event and absent-event queries to assess event grounding and hallucination, respectively, and tests four SOTA Audio LLMs with 12 prompt variants. Results show that increasing event count reduces true-positive rates and increases false-positive rates, highlighting a trade-off induced by prompts and increased model uncertainty in multi-event scenarios.

Key Contribution

Audio LLMs struggle to reliably ground events in complex acoustic scenes, with performance degrading as the number of concurrent events increases, revealing a critical sensitivity gap.

Abstract

Audio LLMs have shown a strong ability to understand audio samples, yet their reliability in complex acoustic scenes remains under-explored. Unlike prior work limited to small scale or less controlled query construction, we present a large-scale evaluation of event grounding and false alarms as auditory scene complexity increases. Using 71K AudioCapsV2 clips, we extract normalized (source, attribute) events and build two query types: present-event queries for ground-truth detection and absent-event queries to probe hallucinations, using similarity-filtered negative sampling in an audio-aligned text embedding space. We evaluate four SOTA Audio LLMs with 12 prompt variants over 500K yes/no queries per model. Across models, increasing event count consistently lowers true-positive rate and raises false-positive rate, while prompts induce a strong trade-off between the two. Our confidence analysis shows that models become more uncertain on multi-event audio, revealing room for improvement.

Eval Frameworks & Benchmarks Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs

Related Papers