Pengkun Wang

Shifting credit assignment to fine-grained decision points boosts agentic RL performance by nearly 4 points, challenging the conventional focus on tool-call boundaries.

Xucong Wang, Ziyu Ma, Yong Wang +5

RLHF & Preference Learning Tool Use & Agents

Jun 9, 2026

DAMO6d ago

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Bootstrapping LLM agents to co-evolve as both agent and environment can lead to significant performance gains, with an average improvement of over 4% on complex tasks.

Xucong Wang, Ziyu Ma, Shidong Yang +4

Scalable Oversight & Alignment Theory Tool Use & Agents

Search

Pengkun Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)