Xuefeng Bai

School of Computer Science and Technology, Harbin Institute of Technology, China

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (1)RLHF & Preference Learning (1)

Frequent co-authors

Mufan Xu (1)Kehai Chen (1)Zhengyu Niu (1)Muyun Yang (1)

Papers (1)

Feb 16, 2026

Feb 16, 2026·also Tsinghua AI

Beyond Token-Level Policy Gradients for Complex Reasoning with Large Language Models

Token-level policy gradients fall short in complex reasoning tasks, but treating sequences of tokens as unified actions can significantly boost performance in mathematical and coding benchmarks.

Mufan Xu, Kehai Chen, Xuefeng Bai +3

Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Xuefeng Bai

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)