CMU MLCornellFreshCognateNational Tutoring ObservatoryFeb 18, 2026arXiv:2602.16571

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

Zhuqian Zhou, Kirk Vanacore, Bakhtawar Ahtisham, Jinsook Lee, Jinsook Lee, Doug Pietrzak, D. Pietrzak, Daryl Hedley, Ruth Schafer, Chris Shaw, René F. Kizilcec, Ruth Schäfer

AI Summary

The paper introduces MathEd-PII, a new benchmark dataset for PII detection in math tutoring dialogues, addressing the problem of "numeric ambiguity" where numbers in math expressions are incorrectly flagged as PII. They demonstrate that generic PII detection systems over-redact instructional content, reducing dataset utility. Through experiments with different detection strategies, including a Presidio baseline and LLM-based approaches, they show that math-aware prompting significantly improves performance and reduces false positives compared to the baseline.

Key Contribution

Generic PII detection tools cripple math tutoring datasets by aggressively redacting numbers, but a math-aware LLM can preserve utility while protecting privacy.

Abstract

Large-scale sharing of dialogue-based data is instrumental for advancing the science of teaching and learning, yet rigorous de-identification remains a major barrier. In mathematics tutoring transcripts, numeric expressions frequently resemble structured identifiers (e.g., dates or IDs), leading generic Personally Identifiable Information (PII) detection systems to over-redact core instructional content and reduce dataset utility. This work asks how PII can be detected in math tutoring transcripts while preserving their educational utility. To address this challenge, we investigate the"numeric ambiguity"problem and introduce MathEd-PII, the first benchmark dataset for PII detection in math tutoring dialogues, created through a human-in-the-loop LLM workflow that audits upstream redactions and generates privacy-preserving surrogates. The dataset contains 1,000 tutoring sessions (115,620 messages; 769,628 tokens) with validated PII annotations. Using a density-based segmentation method, we show that false PII redactions are disproportionately concentrated in math-dense regions, confirming numeric ambiguity as a key failure mode. We then compare four detection strategies: a Presidio baseline and LLM-based approaches with basic, math-aware, and segment-aware prompting. Math-aware prompting substantially improves performance over the baseline (F1: 0.821 vs. 0.379) while reducing numeric false positives, demonstrating that de-identification must incorporate domain context to preserve analytic utility. This work provides both a new benchmark and evidence that utility-preserving de-identification for tutoring data requires domain-aware modeling.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

Related Papers