Search papers, labs, and topics across Lattice.
This paper introduces GraSP-STL, a novel graph-search framework for zero-shot Signal Temporal Logic (STL) planning using offline reinforcement learning. The method learns a goal-conditioned value function to induce a reachability metric, constructs a directed graph abstraction representing feasible transitions, and performs graph search to find waypoint sequences satisfying unseen STL specifications. Experiments demonstrate the framework's ability to generalize to new STL tasks and perform long-horizon planning by composing short-horizon behaviors learned from offline data.
Offline RL can now tackle complex, unseen temporal logic tasks without retraining, by stitching together learned short-horizon behaviors into long-horizon plans.
This paper studies offline, zero-shot planning under Signal Temporal Logic (STL) specifications. We assume access only to an offline dataset of state-action-state transitions collected by a task-agnostic behavior policy, with no analytical dynamics model, no further environment interaction, and no task-specific retraining. The objective is to synthesize a control strategy whose resulting trajectory satisfies an arbitrary unseen STL specification. To this end, we propose GraSP-STL, a graph-search-based framework for zero-shot STL planning from offline trajectories. The method learns a goal-conditioned value function from offline data and uses it to induce a finite-horizon reachability metric over the state space. Based on this metric, it constructs a directed graph abstraction whose nodes represent representative states and whose edges encode feasible short-horizon transitions. Planning is then formulated as a graph search over waypoint sequences, evaluated using arithmetic-geometric mean robustness and its interval semantics, and executed by a learned goal-conditioned policy. The proposed framework separates reusable reachability learning from task-conditioned planning, enabling zero-shot generalization to unseen STL tasks and long-horizon planning through the composition of short-horizon behaviors from offline data. Experimental results demonstrate its effectiveness on a range of offline STL planning tasks.