Mar 1, 2026arXiv:2603.01239

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

AI Summary

The paper introduces and empirically investigates "Self-Anchoring Calibration Drift" (SACD), a phenomenon where LLMs exhibit systematic changes in confidence across multi-turn conversations when building on their own prior outputs. The study compares Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 across 150 questions in three conditions: single-turn, multi-turn self-anchoring, and independent repetition. Results show model-specific patterns, with Claude Sonnet 4.6 exhibiting decreasing confidence, GPT-5.2 showing increasing confidence in open-ended domains, and Gemini 3.1 Pro demonstrating SACD as a suppression of natural calibration improvement.

Key Contribution

LLMs can become systematically over- or under-confident as they build on their own outputs in multi-turn conversations, and this "self-anchoring calibration drift" can even prevent models from becoming better calibrated.

Abstract

We introduce Self-Anchoring Calibration Drift (SACD), a hypothesized tendency for large language models (LLMs) to show systematic changes in expressed confidence when building iteratively on their own prior outputs across multi-turn conversations. We report an empirical study comparing three frontier models -- Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 -- across 150 questions spanning factual, technical, and open-ended domains, using three conditions: single-turn baseline (A), multi-turn self-anchoring (B), and independent repetition control (C). Results reveal a complex, model-heterogeneous pattern that partially diverges from pre-registered hypotheses. Claude Sonnet 4.6 exhibited significant decreasing confidence under self-anchoring (mean CDS = -0.032, t(14) = -2.43, p = .029, d = -0.627), while also showing significant calibration error drift (F(4,56) = 22.77, p < .001, eta^2 = .791). GPT-5.2 showed the opposite pattern in open-ended domains (mean CDS = +0.026) with significant ECE escalation by Turn 5. Gemini 3.1 Pro showed no significant CDS (t(14) = 0.38, p = .710), but its Condition C data reveals a striking ECE pattern: without self-anchoring, Gemini's calibration error drops from .327 to near zero across repetitions, whereas self-anchoring holds ECE flat at approximately .333 -- indicating that SACD can manifest as suppression of natural calibration improvement rather than ac

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Related Papers