Mar 12, 2026arXiv:2603.11665

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

Junjie Wu, Xuan Kan, Zihao He, Shunwen Tan, Bo Pan, Kai Zhang, Kaitai Zhang

AI Summary

This paper introduces Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that uses RL to jointly optimize multimodal LLMs across multiple tasks to improve their judgment capabilities. The approach aims to address the limitations of existing judge models that struggle to generalize to diverse contexts due to being optimized for single-task scenarios. Experiments show that MT-RL-Judge outperforms strong baselines in judgment consistency, correlation with human preferences, and generalization to out-of-distribution tasks.

Key Contribution

MLLMs can now judge more consistently and generalize better thanks to a multi-task reinforcement learning approach that aligns them with human preferences across diverse visual tasks.

Abstract

Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.

Eval Frameworks & Benchmarks Multimodal Models RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

Related Papers