Search papers, labs, and topics across Lattice.
The paper introduces Personalized RAG for Education (PRAG-EDU), a context-aware retrieval-augmented generation framework that tailors LLM responses to students' academic proficiency by integrating historical module grades as pedagogical signals. They created a benchmark dataset of 250 expert-validated question-answer pairs linked to academic profiles to evaluate grade-aware educational RAG in the AI domain. Experiments with seven open-source LLMs showed that PRAG-EDU improves BERTScore F1 by 23.7% and ROUGE-L by 18.3% over non-personalized baselines, demonstrating its pedagogical efficacy.
LLMs can be personalized for education by conditioning RAG on student grades, leading to significant gains in response accuracy and pedagogical efficacy.
While retrieval‐augmented generation (RAG) systems have substantially improved the factual accuracy of Large Language Models (LLMs) in educational contexts, they exhibit a fundamental limitation: an inability to adapt responses to a student's specific academic proficiency. This is a particularly critical gap in Artificial Intelligence (AI) education, where a learner's foundational knowledge in subjects like mathematics, programming, and core AI concepts exhibits significant heterogeneity. To address this, we introduce Personalized RAG for Education (PRAG‐EDU), a novel context‐aware RAG framework that dynamically calibrates response complexity by leveraging students' historical module grades as pedagogical signals. Unlike conventional RAG implementations that treat all learners uniformly, our model integrates these academic profiles with retrieved course materials to generate responses precisely tailored to individual proficiency levels. We establish the first benchmark for grade‐aware educational RAG within the AI domain, comprising 250 expert‐validated question‐answer pairs linked to specific academic profiles and difficulty‐calibrated reference responses. Through a rigorous evaluation of seven open‐source LLMs against our framework, we demonstrate that PRAG‐EDU achieves a 23.7% improvement in BERTScore F1 (0.555 vs. 0.451) and 18.3% higher ROUGE‐L over non‐personalized baselines. A qualitative analysis of 250 student evaluations further confirms its pedagogical efficacy, with expert raters awarding an average of 4.09/5 stars, significantly outperforming the next‐best model ( Qwen3:1.7B at 3.94). This work reveals a notable trade‐off between factual alignment and generative fluency, as our method leads in accuracy while a model like Smollm2:1.7B excels in expressiveness. Ultimately, this research bridges the gap between technical RAG implementations and domain‐specific educational theory by operationalizing academic performance data as a personalization mechanism, offering a scalable solution for heterogeneous AI engineering classrooms.