Huan-ang Gao

Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.

Bingxiang He, Bingxiang He, Yuxin Zuo +30

RLHF & Preference Learning Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Feb 27, 2026

Tsinghua AIFeb 27, 2026·also DUT, IEEE, Lancaster University

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Agentic RL can now beat proprietary LLMs and torch.compile in the challenging domain of CUDA kernel generation, achieving up to 40% speedups on hard tasks.

Weinan Dai, Qiying Yu, Huan-ang Gao +11

Code Generation & Program Synthesis Distributed Systems & Hardware Tool Use & Agents

Search

Huan-ang Gao

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)