Jiangsu Xcmg Construction Machinery Research Institute Co.University of ScienceMar 12, 2026arXiv:2603.11600

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

Qi Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Ming Zhang Department of Transportation Engineering, University of Science, Technology Beijing, China, Jiangsu Xcmg Construction Machinery Research Institute Co., Ltd.

AI Summary

The paper introduces Hybrid Energy-Aware Reward Shaping (H-EARS), a novel reinforcement learning method that combines potential-based reward shaping with energy-aware action regularization to improve convergence and energy efficiency. H-EARS achieves linear complexity by decomposing the potential function into task-specific and energy-based components, capturing dominant energy components without requiring full system dynamics. Theoretical analysis establishes functional independence, convergence acceleration, and convergence guarantees, while experiments and vehicle simulations demonstrate improved performance and applicability in safety-critical domains.

Key Contribution

By blending physics-informed priors into model-free RL, H-EARS offers a lightweight, linear-complexity approach to significantly boost convergence, stability, and energy efficiency in continuous control tasks.

Abstract

Deep reinforcement learning excels in continuous control but often requires extensive exploration, while physics-based models demand complete equations and suffer cubic complexity. This study proposes Hybrid Energy-Aware Reward Shaping (H-EARS), unifying potential-based reward shaping with energy-aware action regularization. H-EARS constrains action magnitude while balancing task-specific and energy-based potentials via functional decomposition, achieving linear complexity O(n) by capturing dominant energy components without full dynamics. We establish a theoretical foundation including: (1) functional independence for separate task/energy optimization; (2) energy-based convergence acceleration; (3) convergence guarantees under function approximation; and (4) approximate potential error bounds. Lyapunov stability connections are analyzed as heuristic guides. Experiments across baselines show improved convergence, stability, and energy efficiency. Vehicle simulations validate applicability in safety-critical domains under extreme conditions. Results confirm that integrating lightweight physics priors enhances model-free RL without complete system models, enabling transfer from lab research to industrial applications.

Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

Related Papers