Apr 16, 2026arXiv:2604.14698

Mean Flow Policy Optimization

Xiaoyi Dong, Xi Sheryl Zhang, Jian Cheng

AI Summary

This paper introduces Mean Flow Policy Optimization (MFPO), a reinforcement learning algorithm that uses MeanFlow models as a more efficient alternative to diffusion models for policy representation. MFPO addresses the challenges of action likelihood evaluation and soft policy improvement specific to MeanFlow policies within a maximum entropy RL framework. Experiments show that MFPO achieves comparable or better performance than diffusion-based RL methods on MuJoCo and DeepMind Control Suite benchmarks, while significantly reducing training and inference time.

Key Contribution

Ditch diffusion models: MeanFlow policies offer a faster, leaner path to high-performing reinforcement learning agents.

Abstract

Diffusion models have recently emerged as expressive policy representations for online reinforcement learning (RL). However, their iterative generative processes introduce substantial training and inference overhead. To overcome this limitation, we propose to represent policies using MeanFlow models, a class of few-step flow-based generative models, to improve training and inference efficiency over diffusion-based RL approaches. To promote exploration, we optimize MeanFlow policies under the maximum entropy RL framework via soft policy iteration, and address two key challenges specific to MeanFlow policies: action likelihood evaluation and soft policy improvement. Experiments on MuJoCo and DeepMind Control Suite benchmarks demonstrate that our method, Mean Flow Policy Optimization (MFPO), achieves performance comparable to or exceeding current diffusion-based baselines while considerably reducing training and inference time. Our code is available at https://github.com/MFPolicy/MFPO.

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mean Flow Policy Optimization

Related Papers