BeihangMeituanPKUUSTCMar 9, 2026arXiv:2603.08035

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin

AI Summary

The paper introduces Contrast-Driven Rubric Reward Model (CDRRM), a framework that uses contrastive profiling on preference pairs to identify key discriminative factors, which are then synthesized into rubrics for guiding preference judgments. This approach aims to improve the interpretability and reliability of reward models while reducing reliance on extensive expert annotations. Experiments on RewardBench, RMBench, and RMB show that CDRRM achieves state-of-the-art performance, mitigates evaluation biases, and exhibits high data efficiency, outperforming fully fine-tuned baselines with only 3k training samples for the rubric generator.

Key Contribution

Forget noisy, biased LLM evaluators: CDRRM distills preference insights into compact rubrics, letting a frozen judge model leapfrog fully fine-tuned baselines with just 3k training samples.

Abstract

Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g., verbosity, position) in LLM evaluators, and creating a scalability-reliability trade-off. To address these limitations, we propose CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment. CDRRM first conducts multi-dimensional contrastive profiling on preference pairs to identify causal discriminative factors, then synthesizes these insights into compact, context-aware rubrics to guide preference judg- ments. Extensive experiments on three authoritative benchmarks (RewardBench, RMBench, RMB) demonstrate that CDRRM achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases. Notably, our approach delivers exceptional data efficiency: training the rubric generator on only 3k high-quality samples empowers a frozen pre-trained judge model to outperform fully fine-tuned baselines. This work offers a scalable, interpretable, and data-efficient path for reward modeling.

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Related Papers