Feb 19, 2026arXiv:2602.17262

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Kensuke Okada, Yui Furukawa, Kyosuke Bunji

AI Summary

This paper introduces a psychometric framework to quantify and mitigate Socially Desirable Responding (SDR) in LLMs when evaluated using questionnaires. SDR is quantified by comparing LLM responses under HONEST versus FAKE-GOOD instructions, using a direction-corrected standardized effect size derived from Item Response Theory (IRT)-estimated latent scores. The authors mitigate SDR by constructing a graded forced-choice (GFC) Big Five inventory, matching item desirability through constrained optimization, and demonstrate that GFC substantially reduces SDR compared to Likert-style questionnaires while preserving persona recovery across nine instruction-tuned LLMs.

Key Contribution

LLMs exhibit significant Socially Desirable Responding (SDR) in standard questionnaires, but a carefully constructed graded forced-choice inventory can mitigate this bias while still capturing intended persona profiles.

Abstract

Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can instead gravitate toward socially preferred answers-a form of socially desirable responding (SDR)-biasing questionnaire-derived scores and downstream conclusions. We propose a psychometric framework to quantify and mitigate SDR in questionnaire-based evaluation of LLMs. To quantify SDR, the same inventory is administered under HONEST versus FAKE-GOOD instructions, and SDR is computed as a direction-corrected standardized effect size from item response theory (IRT)-estimated latent scores. This enables comparisons across constructs and response formats, as well as against human instructed-faking benchmarks. For mitigation, we construct a graded forced-choice (GFC) Big Five inventory by selecting 30 cross-domain pairs from an item pool via constrained optimization to match desirability. Across nine instruction-tuned LLMs evaluated on synthetic personas with known target profiles, Likert-style questionnaires show consistently large SDR, whereas desirability-matched GFC substantially attenuates SDR while largely preserving the recovery of the intended persona profiles. These results highlight a model-dependent SDR-recovery trade-off and motivate SDR-aware reporting practices for questionnaire-based benchmarking and auditing of LLMs.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Related Papers