Wenxi Zhu

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (1)RLHF & Preference Learning (1)

Frequent co-authors

Renjie Mao (1)Lvfang Tao (1)Yixin Ding (1)Yu Shi (1)

Papers (1)

Jun 9, 2026

Renjie Mao +8Jun 9, 2026·also Meta AI

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Early tokens in LLMs can lead to compounding errors, but CPPO's position-sensitive approach offers a solution that boosts reasoning accuracy and training stability.

Renjie Mao, Lvfang Tao, Yixin Ding +6

Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Wenxi Zhu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)