Mar 3, 2026arXiv:2603.03066

EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui

AI Summary

The paper introduces EduAIGV-1k, a new benchmark dataset for assessing the quality of AI-generated videos (AIGVs) designed for teaching foundational math concepts. The dataset contains 1,130 videos generated by ten text-to-video models, annotated with fine-grained labels for perceptual quality (spatial and temporal fidelity) and prompt alignment (word-level and sentence-level accuracy). To evaluate AIGV quality, the authors propose EduVQA, incorporating a Structured 2D Mixture-of-Experts (S2D-MoE) module, and demonstrate its superior performance compared to existing VQA baselines.

Key Contribution

AI-generated educational videos often miss the mark on accurately representing the concepts they're supposed to teach, and this new benchmark dataset and evaluation framework can help close that gap.

Abstract

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely untapped. To close this gap, we present EduAIGV-1k, the first benchmark dataset and evaluation framework dedicated to assessing the quality of AI-generated videos (AIGVs) designed to teach foundational math concepts, such as numbers and geometry, to young learners. EduAIGV-1k contains 1,130 short videos produced by ten state-of-the-art text-to-video (T2V) models using 113 pedagogy-oriented prompts. Each video is accompanied by rich, fine-grained annotations along two complementary axes: (1) Perceptual quality, disentangled into spatial and temporal fidelity, and (2) Prompt alignment, labeled at the word-level and sentence-level to quantify the degree to which each mathematical concept in the prompt is accurately grounded in the generated video. These fine-grained annotations transform each video into a multi-dimensional, interpretable supervision signal, far beyond a single quality score. Leveraging this dense feedback, we introduce EduVQA for both perceptual and alignment quality assessment of AIGVs. In particular, we propose a Structured 2D Mixture-of-Experts (S2D-MoE) module, which enhances the dependency between overall quality and each sub-dimension by shared experts and dynamic 2D gating matrix. Extensive experiments show our EduVQA consistently outperforms existing VQA baselines. Both our dataset and code will be publicly available.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

Related Papers