Apr 23, 2026arXiv:2604.21464

Dynamical Priors as a Training Objective in Reinforcement Learning

AI Summary

The paper introduces Dynamical Prior Reinforcement Learning (DP-RL), a framework that incorporates an auxiliary loss based on external state dynamics to shape the temporal evolution of action probabilities in RL agents. This approach promotes temporally coherent behavior by implementing evidence accumulation and hysteresis, without altering the reward function, environment, or policy architecture. Experiments across three minimal environments demonstrate that DP-RL can systematically alter decision trajectories, leading to more structured and interpretable behavior.

Key Contribution

RL policies don't have to be temporally incoherent messes: shaping action probabilities with dynamical priors unlocks structured, interpretable decision-making.

Abstract

Standard reinforcement learning (RL) optimizes policies for reward but imposes few constraints on how decisions evolve over time. As a result, policies may achieve high performance while exhibiting temporally incoherent behavior such as abrupt confidence shifts, oscillations, or degenerate inactivity. We introduce Dynamical Prior Reinforcement Learning (DP-RL), a training framework that augments policy gradient learning with an auxiliary loss derived from external state dynamics that implement evidence accumulation and hysteresis. Without modifying the reward, environment, or policy architecture, this prior shapes the temporal evolution of action probabilities during learning. Across three minimal environments, we show that dynamical priors systematically alter decision trajectories in task-dependent ways, promoting temporally structured behavior that cannot be explained by generic smoothing. These results demonstrate that training objectives alone can control the temporal geometry of decision-making in RL agents.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Dynamical Priors as a Training Objective in Reinforcement Learning

Related Papers