Mar 31, 2026arXiv:2603.29693

Measuring the metacognition of AI

AI Summary

This paper advocates for using the meta-d' framework and signal detection theory (SDT) to rigorously measure metacognitive abilities in AI systems, specifically their ability to assess confidence and regulate decisions based on risk. They demonstrate the utility of these methods by evaluating the metacognitive sensitivity of GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508 on primary judgment tasks with confidence ratings and risk-modulated responses. The results show how meta-d' allows for comparisons of LLMs against optimality, across models, and across tasks, while SDT reveals whether LLMs adjust their decision-making based on varying risk levels.

Key Contribution

LLMs can be rigorously evaluated for metacognitive abilities like confidence assessment and risk-aware decision-making using psychophysical frameworks borrowed from human cognition research.

Abstract

A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to measure the metacognitive abilities of AI. This paper is primarily a methodological contribution arguing for the adoption of the meta-d' framework, or its model-free alternatives, as the gold standard for assessing the metacognitive sensitivity of AIs--the ability to generate confidence ratings that distinguish correct from incorrect responses. Moreover, we propose to leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk. To demonstrate the practical utility of these psychophysical frameworks, we conduct two series of experiments on three large language models (LLMs)--GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508. In the first experiments, LLMs performed a primary judgment followed by a confidence rating. In the second, LLMs only performed the primary judgment, while we manipulated the risk associated with either response. On the one hand, applying the meta-d' framework allows us to conduct comparisons along three axes: comparing an LLM to optimality, comparing different LLMs on a given task, and comparing the same LLM across different tasks. On the other hand, SDT allows us to assess whether LLMs become more conservative when risks are high.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Measuring the metacognition of AI

Related Papers