Search papers, labs, and topics across Lattice.
M gradient steps with batch size 256256. We use a 44-layer MLP with hidden size 512512 for the base policies and 256256 for the critics. For the latent actors in LPS and DSRL, we use a 22-layer MLP with hidden size 256256. We carefully tune α\alpha for QC-FQL and QC-MFQL. For CFGRL, we use the best-reported CFG strength ww from [6]. Following common practice, we normalize the critic loss to have unit norm. V-B Experimental Results Figure 5: Performance on OGBench. We evaluate the success rates across tasks. Bars report the mean success rate over 33 seeds, and error bars indicate the 9595% confidence interval estimated using bootstrap resampling with
1
0
3
0
Ditching latent critics in offline RL unlocks state-of-the-art performance by directly backpropagating action-space gradients through a differentiable flow-based policy, enabling robust latent policy steering with minimal tuning.