Feb 19, 2026arXiv:2602.17542

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Zhangqi Duan, Arnav Kankaria, Dhruv Kartik, Andrew Lan

AI Summary

This paper introduces a framework that uses LLMs to automatically label the correctness of knowledge components (KCs) in student-written code for open-ended programming problems, addressing the scarcity of KC-level labels in real-world datasets. The method incorporates a temporal context-aware Code-KC mapping to improve alignment between KCs and code. Experiments demonstrate that the LLM-generated labels result in improved learning curve fit and predictive performance compared to baselines, validated by human expert agreement.

Key Contribution

LLMs can automatically label fine-grained programming skills in student code with expert-level accuracy, unlocking richer insights into student learning.

Abstract

Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming tasks where solutions typically involve multiple KCs simultaneously. Simply propagating problem-level correctness to all associated KCs obscures partial mastery and often leads to poorly fitted learning curves. To address this challenge, we propose an automated framework that leverages large language models (LLMs) to label KC-level correctness directly from student-written code. Our method assesses whether each KC is correctly applied and further introduces a temporal context-aware Code-KC mapping mechanism to better align KCs with individual student code. We evaluate the resulting KC-level correctness labels in terms of learning curve fit and predictive performance using the power law of practice and the Additive Factors Model. Experimental results show that our framework leads to learning curves that are more consistent with cognitive theory and improves predictive performance, compared to baselines. Human evaluation further demonstrates substantial agreement between LLM and expert annotations.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Related Papers