Search papers, labs, and topics across Lattice.
This study introduces SpeechJBB, an innovative audio jailbreak dataset designed to evaluate the safety alignment of large audio language models (LALMs) in code-switched speech contexts. By probing the models' responses to harmful audio prompts that incorporate phonologically plausible pseudo-words, the research reveals that code-switched harmful audio significantly increases jailbreak success rates, particularly in non-English contexts. The findings highlight that natural-sounding obfuscation techniques can effectively undermine safety measures, raising critical concerns about the robustness of LALMs in real-world applications.
Code-switched speech can exploit safety weaknesses in LALMs, achieving jailbreak success rates that challenge current safety protocols.
Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.