Google ResearchAmirkabir University of TechnologyIndependent ResearcherTU MunichUniversity of TehranFeb 23, 2026arXiv:2602.20135

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

Mohammad Amanlou, Mohammad Amanlou, Erfan Shafiee Moghaddam, Erfani Moghaddam, Yasaman Jafari, Yasaman Amou Jafari, M. Noori, Mahdi Noori, Farhan Farsi, Farhan Farsi, Behnam Bahrak, Behnam Bahrak

AI Summary

The paper introduces KNIGHT, a knowledge graph-driven framework that leverages LLMs to generate multiple-choice question (MCQ) datasets from external sources like Wikipedia/Wikidata. KNIGHT constructs a topic-specific knowledge graph to enable instructor-controlled difficulty levels, including multi-hop questions, without repeatedly processing the full source text, thereby reducing computational cost. Experiments across History, Biology, and Mathematics demonstrate that KNIGHT generates high-quality MCQs, as assessed by fluency, unambiguity, topic relevance, option uniqueness, and answerability, and produces model rankings consistent with MMLU benchmarks.

Key Contribution

Forget painstakingly curating evaluation datasets: this framework generates high-quality, multi-hop multiple-choice questions from knowledge graphs with tunable difficulty, all while slashing costs.

Abstract

With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured and parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This knowledge graph acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable graph representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

Related Papers