Xingxing Wang

Meituan

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (2)Reasoning & Chain-of-Thought (1)Recommendation & Information Retrieval (1)Tool Use & Agents (1)

Frequent co-authors

Qianlong Xie (2)Yongcan Yu (1)Lingxiao He (1)Jian Liang (1)

Papers (2)

Apr 23, 2026

Apr 23, 2026·also Meituan

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Test-time RL's vulnerability to noisy pseudo-labels is amplified by group-relative advantage estimation, but can be mitigated with a surprisingly simple debiasing and denoising approach.

Yongcan Yu, Lingxiao He, Jian Liang +5

Reasoning & Chain-of-Thought RLHF & Preference Learning

Feb 23, 2026

Tsinghua AIFeb 23, 2026·also CAS, Meituan

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Forget slow and steady: "Fast Thinking" prompts, combined with carefully tuned reward functions and REINFORCE, can dramatically boost the performance of RL-trained research agents.

Yinuo Xu, Shuo Lu, Jianjie Cheng +5

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents