Mar 11, 2026arXiv:2603.10807

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali

AI Summary

This paper introduces a risk-aware evaluation framework for LLM security failures in the BFSI sector, incorporating a domain-specific taxonomy of financial harms, an automated red-teaming pipeline, and an ensemble-based judging protocol. The framework utilizes a novel Risk-Adjusted Harm Score (RAHS) that quantifies the operational severity of disclosures, considering mitigation signals and inter-judge agreement. Experiments across diverse models demonstrate that increased decoding stochasticity and sustained adaptive interaction lead to more severe and operationally actionable financial disclosures, highlighting the limitations of current domain-agnostic security evaluations.

Key Contribution

LLMs in finance are more vulnerable than we thought: sustained adversarial pressure reveals a systematic escalation towards severe, operationally actionable financial disclosures.

Abstract

The rapid adoption of large language models (LLMs) in financial services introduces new operational, regulatory, and security risks. Yet most red-teaming benchmarks remain domain-agnostic and fail to capture failure modes specific to regulated BFSI settings, where harmful behavior can be elicited through legally or professionally plausible framing. We propose a risk-aware evaluation framework for LLM security failures in Banking, Financial Services, and Insurance (BFSI), combining a domain-specific taxonomy of financial harms, an automated multi-round red-teaming pipeline, and an ensemble-based judging protocol. We introduce the Risk-Adjusted Harm Score (RAHS), a risk-sensitive metric that goes beyond success rates by quantifying the operational severity of disclosures, accounting for mitigation signals, and leveraging inter-judge agreement. Across diverse models, we find that higher decoding stochasticity and sustained adaptive interaction not only increase jailbreak success, but also drive systematic escalation toward more severe and operationally actionable financial disclosures. These results expose limitations of single-turn, domain-agnostic security evaluation and motivate risk-sensitive assessment under prolonged adversarial pressure for real-world BFSI deployment.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Related Papers