Feb 24, 2026arXiv:2602.20527

A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies

Md Mirajul Islam, Xi Yang, Adittya Soukarjya Saha, Rajesh Debnath, Min Chi

AI Summary

This paper addresses the challenge of sample inefficiency in applying DRL to educational technologies by using a generalized Apprenticeship Learning (AL) framework called THEMES to infer pedagogical policies from expert student demonstrations. THEMES captures the dynamic evolution of multiple reward functions underlying the student learning process, enabling the induction of effective pedagogical policies. Experimental results demonstrate THEMES' superior performance against six baselines, achieving an AUC of 0.899 and a Jaccard index of 0.653 when predicting student pedagogical decisions using only 18 trajectories from a previous semester.

Key Contribution

Forget hand-crafting reward functions for your intelligent tutoring system; this apprenticeship learning approach learns evolving pedagogical strategies directly from expert student behavior.

Abstract

Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL to educational technologies has been limited due to major challenges such as sample inefficiency and difficulty designing the reward function. In contrast, Apprenticeship Learning (AL) uses a few expert demonstrations to infer the expert's underlying reward functions and derive decision-making policies that generalize and replicate optimal behavior. In this work, we leverage a generalized AL framework, THEMES, to induce effective pedagogical policies by capturing the complexities of the expert student learning process, where multiple reward functions may dynamically evolve over time. We evaluate the effectiveness of THEMES against six state-of-the-art baselines, demonstrating its superior performance and highlighting its potential as a powerful alternative for inducing effective pedagogical policies and show that it can achieve high performance, with an AUC of 0.899 and a Jaccard of 0.653, using only 18 trajectories of a previous semester to predict student pedagogical decisions in a later semester.

RLHF & Preference Learning Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies

Related Papers