MilaMcGillJun 4, 2026arXiv:2606.06037

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Virginia Ceccatelli, Yejin Jeon, David Ifeoluwa Adelani

AI Summary

This study introduces SpeechJBB, an innovative audio jailbreak dataset designed to evaluate the safety alignment of large audio language models (LALMs) in code-switched speech contexts. By probing the models' responses to harmful audio prompts that incorporate phonologically plausible pseudo-words, the research reveals that code-switched harmful audio significantly increases jailbreak success rates, particularly in non-English contexts. The findings highlight that natural-sounding obfuscation techniques can effectively undermine safety measures, raising critical concerns about the robustness of LALMs in real-world applications.

Key Contribution

Code-switched speech can exploit safety weaknesses in LALMs, achieving jailbreak success rates that challenge current safety protocols.

Abstract

Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Related Papers