Yueqing Sun

Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.

Tianhao Hu, Xiangcheng Liu, Youshao Xiao +12

Distributed Systems & Hardware RLHF & Preference Learning Training Efficiency & Optimization

Mar 11, 2026

Yikai Zhang +6Mar 11, 2026

$V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts

Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.

Yikai Zhang, Yueqing Sun, Hongyan Hao +4

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Search

Yueqing Sun

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)