Mar 15, 2026arXiv:2603.14600

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

AI Summary

This paper introduces a visualization framework for interpreting reinforcement learning dynamics, focusing on the interplay between value estimation, policy optimization, and temporal-difference signals. The framework reconstructs the critic match loss surface, visualizes the actor loss landscape under a frozen critic, tracks trajectories of updates, and maps state-TD errors. Applied to the ADHDP algorithm for spacecraft attitude control, the framework elucidates how different training stabilizers and target updates impact the optimization landscape and learning stability.

Key Contribution

Uncover the hidden dynamics of your RL agent with a new visualization framework that reveals how TD errors sculpt the optimization landscape and drive policy updates.

Abstract

Reinforcement learning algorithms have been widely used in dynamic and control systems. However, interpreting their internal learning behavior remains a challenge. In the authors' previous work, a critic match loss landscape visualization method was proposed to study critic training. This study extends that method into a framework which provides a multi-perspective view of the learning dynamics, clarifying how value estimation, policy optimization, and temporal-difference (TD) signals interact during training. The proposed framework includes four complementary components; a three-dimensional reconstruction of the critic match loss surface that shows how TD targets shape the optimization geometry; an actor loss landscape under a frozen critic that reveals how the policy exploits that geometry; a trajectory combining time, Bellman error, and policy weights that indicates how updates move across the surface; and a state-TD map that identifies the state regions that drive those updates. The Action-Dependent Heuristic Dynamic Programming (ADHDP) algorithm for spacecraft attitude control is used as a case study. The framework is applied to compare several ADHDP variants and shows how training stabilizers and target updates change the optimization landscape and affect learning stability. Therefore, the proposed framework provides a systematic and interpretable tool for analyzing reinforcement learning behavior across algorithmic designs.

Interpretability & Mechanistic Interp Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Related Papers