Apr 23, 2026arXiv:2604.21456

Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

AI Summary

This paper introduces Tempered Sequential Monte Carlo (TSMC), a sampling-based method for trajectory and policy optimization that frames controller design as inference. TSMC minimizes a KL-regularized expected trajectory cost, iteratively reweighting and resampling particles along a tempering path from a prior to a Boltzmann-tilted target distribution. Experiments on trajectory and policy optimization benchmarks demonstrate that TSMC achieves competitive performance compared to state-of-the-art baselines.

Key Contribution

Controller design can be effectively framed as inference, enabling efficient trajectory and policy optimization via tempered sampling.

Abstract

We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal"Boltzmann-tilted"distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References112

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

Related Papers