East China University of Science and TechnologyFeb 25, 2026arXiv:2602.21628

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Yukun Chen, Jiaming Li, Longze Chen, Ze Gong, Jingpeng Li, Zhen Qin, Hengyu Chang, Ancheng Xu, Zhihao Yang, Hamid Alinejad-Rokny, Qiang Qu, Bo Zheng, Min Yang

AI Summary

The paper introduces Stratified Rubric-based Curriculum Learning (RuCL) to improve reasoning in Multimodal Large Language Models (MLLMs) by addressing reward hacking and inefficient training dynamics in existing rubric-based methods. RuCL generates generalized rubrics and stratifies them based on the model's competence, dynamically adjusting rubric weights during training to guide the model from basic perception to advanced reasoning. Experiments on visual reasoning benchmarks demonstrate that RuCL achieves a significant +7.83% average improvement over the Qwen2.5-VL-7B model, reaching a state-of-the-art accuracy of 60.06%.

Key Contribution

MLLMs can be significantly boosted by curriculum learning that focuses on reward design rather than data selection, dynamically weighting generalized rubrics based on the model's evolving competence.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a prevailing paradigm for enhancing reasoning in Multimodal Large Language Models (MLLMs). However, relying solely on outcome supervision risks reward hacking, where models learn spurious reasoning patterns to satisfy final answer checks. While recent rubric-based approaches offer fine-grained supervision signals, they suffer from high computational costs of instance-level generation and inefficient training dynamics caused by treating all rubrics as equally learnable. In this paper, we propose Stratified Rubric-based Curriculum Learning (RuCL), a novel framework that reformulates curriculum learning by shifting the focus from data selection to reward design. RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence. By dynamically adjusting rubric weights during training, RuCL guides the model from mastering foundational perception to tackling advanced logical reasoning. Extensive experiments on various visual reasoning benchmarks show that RuCL yields a remarkable +7.83% average improvement over the Qwen2.5-VL-7B model, achieving a state-of-the-art accuracy of 60.06%.

Multimodal Models Reasoning & Chain-of-Thought Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Related Papers