Search papers, labs, and topics across Lattice.
This paper introduces Symbolic Reward Machines (SRMs) to address the limitations of Reward Machines (RMs) in reinforcement learning, which require manual user-defined labeling functions for each environment. SRMs directly process environment observations through symbolic formulas, eliminating the need for manual labeling. The authors demonstrate that SRMs, along with the proposed QSRM and LSRM learning algorithms, achieve performance comparable to existing RM methods while adhering to standard environment definitions and providing interpretable task representations.
Automating reward function design with Symbolic Reward Machines unlocks reinforcement learning for complex tasks without manual environment labeling.
Reward Machines (RMs) are an established mechanism in Reinforcement Learning (RL) to represent and learn sparse, temporally extended tasks with non-Markovian rewards. RMs rely on high-level information in the form of labels that are emitted by the environment alongside the observation. However, this concept requires manual user input for each environment and task. The user has to create a suitable labeling function that computes the labels. These limitations lead to poor applicability in widely adopted RL frameworks. We propose Symbolic Reward Machines (SRMs) together with the learning algorithms QSRM and LSRM to overcome the limitations of RMs. SRMs consume only the standard output of the environment and process the observation directly through guards that are represented by symbolic formulas. In our evaluation, our SRM methods outperform the baseline RL approaches and generate the same results as the existing RM methods. At the same time, our methods adhere to the widely used environment definition and provide interpretable representations of the task to the user.