Search papers, labs, and topics across Lattice.
University of Science and Technology of China
1
0
3
RL fine-tuning of hybrid autoregressive-diffusion models can be made significantly more stable and effective by averaging gradients across multiple diffusion trajectories and filtering autoregressive tokens for consistency.