Chung-AngMay 27, 2026arXiv:2605.28188

Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment

Seojin Hwang, Minju Kim, Junhyuk Choi, JeongHyun Park, Hwanhee Lee

AI Summary

The paper introduces Fragile, a benchmark to evaluate the framing sensitivity of LLMs across value-tinted narration, temporal slice, and narrative vividness. Experiments using Fragile reveal that LLMs exhibit significant framing sensitivity, with a 28.6% average decision flip rate, and that common interventions exacerbate the problem. To mitigate this, the authors propose Valign, a representation-level method that anchors decisions to a stable value prior and projects out framing-sensitive directions from hidden states, reducing decision flips.

Key Contribution

LLMs are surprisingly susceptible to irrelevant framing details, flipping decisions nearly 30% of the time, and naive attempts to fix it only make things worse.

Abstract

Large Language Models (LLMs) are increasingly deployed in high-stakes decision-making settings such as legal reasoning, where consistency under factually equivalent inputs is critical. However, we find that fact-preserved but differently framed inputs can significantly destabilize LLM decisions. To systematically investigate this problem, we introduce Fragile, a large-scale benchmark that isolates fact-preserving semantic framing across three controlled dimensions: value-tinted narration, temporal slice, and narrative vividness. Our experiments reveal a high susceptibility of LLMs to framing, with an average decision flip rate of 28.6%. We find that simple prior prompt-level and activation-level interventions not only fail to suppress framing sensitivity but actively amplify it. We therefore propose Valign, a representation-level method that explicitly targets these framing dimensions by anchoring decisions to a stable value prior, steering hidden states toward the model's value-consistent direction, and projecting out temporal-vividness-sensitive directions from the model's hidden states. Valign consistently reduces framing-induced decision flips, demonstrating that robust mitigation requires directly targeting the internal pathways in which framing operates.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment

Related Papers