Apr 1, 2026arXiv:2604.00977

Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

Ruijie Hao, Longfei Zhang, Yang Dai, Yang Ma, Xingxing Liang, Guangquan Cheng

AI Summary

This paper introduces Flow-based Policy with Distributional RL (FP-DRL), an algorithm that uses flow matching to model policies capable of capturing multimodal distributions, addressing limitations of traditional Gaussian policies in reinforcement learning. FP-DRL also employs distributional RL to model and optimize the entire return distribution, providing more effective guidance for policy updates. Experiments on MuJoCo benchmarks show FP-DRL achieves state-of-the-art performance and exhibits superior representation capabilities.

Key Contribution

Ditch unimodal policies: flow-based policies combined with distributional RL unlock SOTA performance on MuJoCo by capturing complex, multimodal return distributions.

Abstract

Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient guidance for policy updates. In response to these problems, we propose a RL algorithm termed flow-based policy with distributional RL (FP-DRL). This algorithm models the policy using flow matching, which offers both computational efficiency and the capacity to fit complex distributions. Additionally, it employs a distributional RL approach to model and optimize the entire return distribution, thereby more effectively guiding multimodal policy updates and improving agent performance. Experimental trails on MuJoCo benchmarks demonstrate that the FP-DRL algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting superior representation capability of the flow policy.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

Related Papers