Jun 9, 2026arXiv:2606.10613

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

Thanh Nguyen, Tri Ton, Hongbin Choe, Tung M. Luu, Chang D. Yoo

AI Summary

This paper introduces Bootstrapped Flow Q-Learning (BFQ), a novel framework for offline reinforcement learning that facilitates accurate single-step action generation without the need for auxiliary networks or distillation. By leveraging a divide-and-conquer approach to learn short-range displacements from the Flow Matching marginal velocity, BFQ effectively eliminates the reliance on multi-step denoising, leading to a more efficient and robust learning process. Extensive evaluations on D4RL benchmarks reveal that BFQ not only enhances performance but also significantly reduces computational costs compared to traditional multi-step diffusion methods.

Key Contribution

Single-step action generation can outperform multi-step diffusion methods in offline reinforcement learning, achieving higher performance with lower computational costs.

Abstract

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

Related Papers