University of VeronaApr 28, 2026arXiv:2604.25534

Sample-efficient Neuro-symbolic Proximal Policy Optimization

Simone Murari, Celeste Veronese, Daniele Meli

AI Summary

This paper introduces two neuro-symbolic extensions to Proximal Policy Optimization (PPO), H-PPO-Product and H-PPO-SymLoss, that leverage logical policy specifications learned in simpler environments to guide learning in more complex, sparse-reward settings. H-PPO-Product biases the action distribution during sampling, while H-PPO-SymLoss adds a symbolic regularization term to the PPO loss. Experiments across OfficeWorld, WaterWorld, and DoorKey demonstrate that these methods achieve faster learning and higher returns compared to PPO and Reward Machine baselines, even with imperfect symbolic knowledge.

Key Contribution

Neuro-symbolic guidance can dramatically accelerate reinforcement learning in sparse-reward environments, even when the symbolic knowledge is imperfect.

Abstract

Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in more challenging settings. We introduce two integrations of symbolic guidance: (i) H-PPO-Product, which biases the action distribution at sampling time, and (ii) H-PPO-SymLoss, which augments the PPO loss with a symbolic regularization term. We evaluate our methods on three benchmarks (OfficeWorld, WaterWorld, and DoorKey), showing consistently faster learning and higher return at convergence than PPO and a Reward Machine baseline, also under imperfect symbolic knowledge.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Sample-efficient Neuro-symbolic Proximal Policy Optimization

Related Papers