UWCambricon TechnologiesCASICT CASInstitute of AI for IndustriesNorthwesternUSCUT AustinMay 26, 2026arXiv:2605.26872

The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection

Zhengyu Hu, Zheyuan Xiao, Linxin Song, Fengqing Jiang, Yutai Li, Zhengyu Chen, Zhihan Xiong, Yue Liu, Junhao Lin, Yao Su, Lijie Hu, Kaize Ding, Xiao Teng, Radha Poovendran

AI Summary

This paper investigates the common practice of using the highest-performing teacher LLM to generate training data for student models, arguing that teacher test performance is not always indicative of teaching quality. They introduce Student-Centric Answer Sampling (SCAS), a framework that selects teacher-generated answers based on an estimated student-centric learning cost derived from token-wise gradient decomposition. Experiments across various models and tasks demonstrate that SCAS consistently improves student performance by prioritizing supervision matched to the student's current learning state.

Key Contribution

The best LLM to answer a question isn't always the best LLM to *teach* the answer, and matching the "difficulty" of the explanation to the student's current abilities yields better learning.

Abstract

LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers provide correct answers to the same question, the answer from the strongest teacher is not necessarily the best supervision for a given student. To address this gap, we propose Student-Centric Answer Sampling (SCAS), a framework that selects from verified teacher-generated answers according to their estimated student-centric learning cost. Motivated by a token-wise gradient decomposition, we derive an efficient forward-only proxy for this cost and use it to guide answer selection during training. Experiments across 30 teacher models, 6 student base models, and 8 tasks show that SCAS consistently improves student performance, suggesting that effective distillation should prioritize supervision matched to the current student rather than teacher strength alone.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection

Related Papers