[email protected]AaltoCambridgeTrentoJul 6, 2025arXiv:2507.04340

Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Jan Kompatscher, Danqing Shi, Giovanna Varni, T. Weinkauf, Antti Oulasvirta

AI Summary

This paper introduces an interactive visualization tool for Reinforcement Learning from Human Feedback (RLHF) that enables users to compare groups of sampled behaviors instead of individual pairs. The interface consists of an exploration view using hierarchical clustering for overview and a comparison view for detailed group comparisons, enhanced by an active learning approach for suggesting informative groups. Evaluations on six simulated robotics tasks demonstrate that this groupwise comparison method improves final rewards by 69.34% compared to pairwise comparisons, while also reducing error rates and improving policy quality.

Key Contribution

Ditch tedious pairwise comparisons: a new interactive visualization tool for RLHF lets you compare *groups* of behaviors, boosting rewards by nearly 70%.

Abstract

Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behaviour with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualisation that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviours organised in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviours for user queries. Users can efficiently explore large sets of behaviours by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final rewards by 69.34%. It leads to lower error rates and better policies. We open‐source the code that can be easily integrated into the RLHF training loop, supporting research on human–AI alignment.

Interpretability & Mechanistic Interp RLHF & Preference Learning

Citation Metrics

Citations3

Influential citations0

References79

Year2025

VenueComputer graphics forum (Print)

Related Papers

Finding related papers...

Search

Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Related Papers