Search papers, labs, and topics across Lattice.
The paper introduces Math Education Digital Shadows (MEDS), a dataset of 28,000 personas from 14 LLMs designed to map LLM reasoning and biases in mathematics across human- and AI-like conditions. MEDS includes psychological/sociodemographic metadata, math tasks (open interviews, psychometric tests, cognitive networks, and high-school math questions), and reasoning/confidence scores. Analysis reveals schema integrity, consistent personas, and family-specific peculiarities like negative math attitudes and overconfidence, making it a valuable resource for improving AI tutors.
LLMs exhibit surprisingly human-like biases and overconfidence in math, revealed by a new dataset mapping their mathematical reasoning across diverse personas.
To enhance LLMs'impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- and AI-like conditions. MEDS involves 28,000 personas from 14 LLMs (from families like Mistral, Qwen, DeepSeek, Granite, Phi and Grok) shadowing either humans or AI assistants. Each record/shadow includes a set of prompts along with psychological/sociodemographic persona metadata and four types of math tasks: (i) open math interview, (ii) three psychometric tests about math perceptions with explanations, (iii) cognitive networks capturing math attitudes, and (iv) 18 high-school math test questions together with their reasoning and confidence scores. MEDS differs from traditional score-only math benchmarks because it integrates concepts of self-efficacy, math anxiety, and cognitive network science besides math proficiency scores. Data validation shows that the sampled LLMs exhibit schema integrity and consistent personas, together with family-specific peculiarities like human-like negative math attitudes, logical fallacies, and math overconfidence. MEDS will benefit learning analytics experts, cognitive scientists, and developers of safer AI tutors in mathematics.