USCMar 9, 2026arXiv:2603.09011

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, S. Nikolaidis, Maja Matari'c

AI Summary

This paper introduces CMA-ES-IG, a preference learning algorithm for robots that explicitly optimizes for user experience by suggesting perceptually distinct and informative trajectories for users to rank. CMA-ES-IG combines Covariance Matrix Adaptation Evolution Strategies with Information Gain to balance preference learning outcomes with user expectations during the ranking process. Experiments in simulation and with a real robot demonstrate that CMA-ES-IG scales to higher-dimensional preference spaces, is robust to noisy feedback, and is preferred by users compared to existing methods.

Key Contribution

Users prefer robots that learn their preferences using CMA-ES-IG because it suggests more perceptually distinct and informative behaviors to rank.

Abstract

Robots that interact with humans must adapt to individual users'preferences to operate effectively in human-centered environments. An intuitive and effective technique to learn non-expert users'preferences is through rankings of robot behaviors, e.g., trajectories, gestures, or voices. Existing techniques primarily focus on generating queries that optimize preference learning outcomes, such as sample efficiency or final preference estimation accuracy. However, the focus on outcome overlooks key user expectations in the process of providing these rankings, which can negatively impact users'adoption of robotic systems. This work proposes the Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG) algorithm. CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank. We demonstrate these benefits through both simulated studies and real-robot experiments. CMA-ES-IG, compared to state-of-the-art alternatives, (1) scales more effectively to higher-dimensional preference spaces, (2) maintains computational tractability for high-dimensional problems, (3) is robust to noisy or inconsistent user feedback, and (4) is preferred by non-expert users in identifying their preferred robot behaviors. This project's code is available at github.com/interaction-lab/CMA-ES-IG

RLHF & Preference Learning Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References109

Year2026

VenueN/A

Related Papers

Finding related papers...