Mar 9, 2026arXiv:2603.08588

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

Riccardo De Monte, Matteo Cederle, Gian Antonio Susto

AI Summary

This paper introduces Streaming Soft Actor-Critic (S2AC) and Streaming Deterministic Actor-Critic (SDAC), two novel streaming deep RL algorithms tailored for continuous control tasks and on-device finetuning. These algorithms are designed to be compatible with state-of-the-art batch RL methods, facilitating Sim2Real transfer. The proposed methods achieve performance comparable to existing streaming baselines on standard benchmarks without extensive hyperparameter optimization.

Key Contribution

Streaming RL can now match batch RL performance in continuous control, opening the door to efficient on-device finetuning and Sim2Real transfer without tedious hyperparameter tuning.

Abstract

State-of-the-art deep reinforcement learning (RL) methods have achieved remarkable performance in continuous control tasks, yet their computational complexity is often incompatible with the constraints of resource-limited hardware, due to their reliance on replay buffers, batch updates, and target networks. The emerging paradigm of streaming deep RL addresses this limitation through purely online updates, achieving strong empirical performance on standard benchmarks. In this work, we propose two novel streaming deep RL algorithms, Streaming Soft Actor-Critic (S2AC) and Streaming Deterministic Actor-Critic (SDAC), explicitly designed to be compatible with state-of-the-art batch RL methods, making them particularly suitable for on-device finetuning applications such as Sim2Real transfer. Both algorithms achieve performance comparable to state-of-the-art streaming baselines on standard benchmarks without requiring tedious hyperparameter tuning. Finally, we further investigate the practical challenges of transitioning from batch to streaming learning during finetuning and propose concrete strategies to tackle them.

Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

Related Papers