Search papers, labs, and topics across Lattice.
This paper investigates the common practice of using the highest-performing teacher LLM to generate training data for student models, arguing that teacher test performance is not always indicative of teaching quality. They introduce Student-Centric Answer Sampling (SCAS), a framework that selects teacher-generated answers based on an estimated student-centric learning cost derived from token-wise gradient decomposition. Experiments across various models and tasks demonstrate that SCAS consistently improves student performance by prioritizing supervision matched to the student's current learning state.
The best LLM to answer a question isn't always the best LLM to *teach* the answer, and matching the "difficulty" of the explanation to the student's current abilities yields better learning.
LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers provide correct answers to the same question, the answer from the strongest teacher is not necessarily the best supervision for a given student. To address this gap, we propose Student-Centric Answer Sampling (SCAS), a framework that selects from verified teacher-generated answers according to their estimated student-centric learning cost. Motivated by a token-wise gradient decomposition, we derive an efficient forward-only proxy for this cost and use it to guide answer selection during training. Experiments across 30 teacher models, 6 student base models, and 8 tasks show that SCAS consistently improves student performance, suggesting that effective distillation should prioritize supervision matched to the current student rather than teacher strength alone.