Search papers, labs, and topics across Lattice.
This paper addresses the challenge of sample inefficiency in applying DRL to educational technologies by using a generalized Apprenticeship Learning (AL) framework called THEMES to infer pedagogical policies from expert student demonstrations. THEMES captures the dynamic evolution of multiple reward functions underlying the student learning process, enabling the induction of effective pedagogical policies. Experimental results demonstrate THEMES' superior performance against six baselines, achieving an AUC of 0.899 and a Jaccard index of 0.653 when predicting student pedagogical decisions using only 18 trajectories from a previous semester.
Forget hand-crafting reward functions for your intelligent tutoring system; this apprenticeship learning approach learns evolving pedagogical strategies directly from expert student behavior.
Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL to educational technologies has been limited due to major challenges such as sample inefficiency and difficulty designing the reward function. In contrast, Apprenticeship Learning (AL) uses a few expert demonstrations to infer the expert's underlying reward functions and derive decision-making policies that generalize and replicate optimal behavior. In this work, we leverage a generalized AL framework, THEMES, to induce effective pedagogical policies by capturing the complexities of the expert student learning process, where multiple reward functions may dynamically evolve over time. We evaluate the effectiveness of THEMES against six state-of-the-art baselines, demonstrating its superior performance and highlighting its potential as a powerful alternative for inducing effective pedagogical policies and show that it can achieve high performance, with an AUC of 0.899 and a Jaccard of 0.653, using only 18 trajectories of a previous semester to predict student pedagogical decisions in a later semester.