Feb 26, 2026arXiv:2602.22859

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Hongrui Jia, Hongrui Jia, Chaoya Jiang, Chaoya Jiang, Shikun Zhang, Shikun Zhang, Wei Ye, Weihao Ye

AI Summary

The paper introduces Diagnostic-driven Progressive Evolution (DPE), a novel iterative training paradigm for Large Multimodal Models (LMMs) that uses diagnosis of model weaknesses to dynamically guide data generation and reinforcement learning. DPE employs multiple agents to annotate and quality control unlabeled multimodal data, attributing failures to specific weaknesses and adjusting the data mixture accordingly. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct demonstrate that DPE achieves stable and continual gains across eleven benchmarks, suggesting its effectiveness for continual LMM training.

Key Contribution

Forget static datasets: this iterative training loop uses diagnostic feedback to continuously patch the blind spots in large multimodal models, leading to consistent performance gains.

Abstract

As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.

Multimodal Models RLHF & Preference Learning Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Related Papers