Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Anthony Liang, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer, Dieter Fox, Anqi Li, Abhishek Gupta, Stephen Tu, Erdem Bıyık, Jesse Zhang

AI Summary

The paper introduces Robometer, a reward modeling framework for robotics that combines frame-level progress supervision on expert data with trajectory-comparison preference supervision to learn from suboptimal and failed trajectories. This approach addresses the scalability limitations of traditional reward models that rely solely on dense progress labels from expert demonstrations. By training on RBM-1M, a new dataset of one million robot trajectories, Robometer demonstrates improved generalization and robot learning performance in downstream tasks.

Key Contribution

Learning robotic reward functions from a million trajectories reveals that comparing entire trajectories, not just individual frames, unlocks better generalization and learning from suboptimal data.

Abstract

General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints across trajectories of the same task, enabling effective learning from both real and augmented failed trajectories. To support this formulation at scale, we curate RBM-1M, a reward-learning dataset comprising over one million trajectories spanning diverse robot embodiments and tasks, including substantial suboptimal and failure data. Across benchmarks and real-world evaluations, Robometer learns more generalizable reward functions than prior methods and improves robot learning performance across a diverse set of downstream applications. Code, model weights, and videos at https://robometer.github.io/.

RLHF & Preference Learning Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References141

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Related Papers