Search papers, labs, and topics across Lattice.
Oracle America Inc. Correspondence: priyaranjanpattnayak@gmail.com Abstract Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce Indic Jailbreak Robustness (IJR), a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages ( 2.1 Billion speakers), covering 45,216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. IJR reveals three patterns. (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.920.92 JSR, and in Free all models reach ≈\approx1.0 with refusals collapsing. (2) English→Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. (3) Orthography matters: romanized/mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (ρ≈0.28\rho\approx 0.28–0.320.32) indicating systematic effects. Human audits confirm detector reliability, and lite-to-full comparisons preserve conclusions. IJR offers a reproducible multilingual stress test revealing risks hidden by English-only, contract-focused evaluations, especially for South Asian users who frequently code-switch and romanize. IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages Priyaranjan Pattnayak1, Sanchari Chowdhuri1
1
1
3
7
LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.