Search papers, labs, and topics across Lattice.
H(sg(Rtλ−vψ(st)max(1,S))logπθ(at|st)+ηH[πθ(at|st)])]\mathcal{L}_{\mathrm{actor}}(\theta)=-\mathbb{E}_{p_{\phi},\pi_{\theta}}\left[\sum_{t=1}^{H}\left(\mathrm{sg}\left(\frac{R^{\lambda}_{t}-v_{\psi}(s_{t})}{\max(1,S)}\right)\log\pi_{\theta}(a_{t}|s_{t})+\eta\mathrm{H}[\pi_{\theta}(a_{t}|s_{t})]\right)\right] (7) where SS is a dynamically scaled normalizer computed as an exponential moving average of the 5–95th percentile range of returns, i.e., S≐EMA(Per(Rtλ,95)−Per(Rtλ,5),0.99)S\doteq\operatorname{EMA}(\operatorname{Per}(R^{\lambda}_{t},95)-\operatorname{Per}(R^{\lambda}_{t},5),0.99), which improves robustness to outliers across diverse environments. 4 Experiments In this section, we conduct a series of experiments to validate the core claims of our work: that R2-Dreamer learns high-quality representations in a decoder-free and DA-free manner, leading to a framework that is not only computationally efficient but also highly performant. Our evaluation is structured to answer the following key questions: 1. How does R2-Dreamer perform against leading decoder-based and decoder-free agents on standard continuous control benchmarks? (Sec. 4.2, Sec. 4.3) 2. How does our internal regularization handle challenging scenarios where task-relevant information is subtle and easily missed by competing methods? (Sec. 4.4) 3. How does the learned representation qualitatively differ from baselines in focusing on task-relevant information? (Sec. 4.5) 4. What is the direct impact of our proposed redundancy reduction objective compared to other design choices, particularly DA? (Sec. 4.6) 5. What are the computational benefits of its decoder-free and DA-free design in practice? (Sec. 4.7) We report task scores on DMC and DMC-Subtle and success rates on Meta-World, summarizing results with mean and median across tasks, and provide detailed per-task curves in the appendix. In all experiments, we conduct training over five random seeds, with 10 evaluation episodes per seed, and, unless otherwise stated, use the same hyperparameter configuration (see Appendix F) across all tasks and benchmark suites. 4.1 Experimental Setup Baselines We compare R2-Dreamer against a carefully selected set of competitive baselines to cover the main paradigms of image-based reinforcement learning: • R2-Dreamer (ours): Implemented on top of our PyTorch-based DreamerV3 reproduction. This unified codebase is used for all decoder-free variants to ensure that performance differences are directly attributable to the representation learning objective. • DreamerV3 (Hafner et al., 2025): A leading and highly competitive decoder-based world model. To provide one of the strongest and most credible baselines, we use the author’s official JAX implementation as our primary point of comparison, utilizing the latest version which includes several algorithmic improvements made in April 2024.111https://github.com/danijar/dreamerv3
Meta AI (FAIR)1
0
3
0
Ditch the data augmentation and decoders: R2-Dreamer's Barlow Twins-inspired objective delivers faster, more versatile MBRL, especially when spotting the small stuff matters.