Search papers, labs, and topics across Lattice.
Beijing Academy of Artificial Intelligence
2
0
4
3
Forget external teachers – the best way to boost your RL policy might be learning from its future self.
RL fine-tuning of discrete diffusion models can be made dramatically more stable and effective by treating the final denoised sample as the action and reconstructing trajectories using the forward diffusion process.