Search papers, labs, and topics across Lattice.
This paper introduces Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that uses RL to jointly optimize multimodal LLMs across multiple tasks to improve their judgment capabilities. The approach aims to address the limitations of existing judge models that struggle to generalize to diverse contexts due to being optimized for single-task scenarios. Experiments show that MT-RL-Judge outperforms strong baselines in judgment consistency, correlation with human preferences, and generalization to out-of-distribution tasks.
MLLMs can now judge more consistently and generalize better thanks to a multi-task reinforcement learning approach that aligns them with human preferences across diverse visual tasks.
Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.