Search papers, labs, and topics across Lattice.
This paper introduces Space-sampled Value Decay, a novel forgetting mechanism designed to enhance value-based deep reinforcement learning (RL) in non-stationary environments. By allowing agents to adapt their behavior without requiring explicit information about environmental changes, the method shows promising improvements in performance for Deep Q-networks (DQN) and Soft Actor-Critic (SAC) architectures. The findings reveal that while the approach effectively mitigates the impact of drift, it also presents certain limitations in the returns achieved in dynamic settings.
Forgetting mechanisms can significantly boost the adaptability of RL agents in changing environments, even without explicit drift information.
Studies on rodents such as mice have shown the capabilities to adapt their behavior when dealing with changing parameters (``drift'') of the environment even if no information about change is provided (uncertainty) -- a behavior that can be modeled by forgetting mechanisms. Non-stationary Reinforcement Learning (NSRL) deals with adapting state-of-the-art RL methods to deal with changing environments: these however usually require (partially) perfect information about the drift such as ``task IDs'' or ``context''. To mitigate the effects of drift, this work develops \emph{Space-sampled Value Decay} as an explicit forgetting mechanism for value-based deep RL architectures as a simple yet effective approach. In particular we demonstrate and discuss positive effects but also limitations in achieved returns for modifications of Deep Q-networks (DQN) and Soft Actor-Critic (SAC) when evaluated on non-stationary environments.