Xinyu Ma

LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.

Xinyu Ma, Mingzhou Xu, Chang Jin

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Mar 31, 2026

Mar 31, 2026·also Baidu, Leiden, McMaster University, UvA

Cold-Starts in Generative Recommendation: A Reproducibility Study

Generative recommendation's cold-start gains are often illusory, inflated by inconsistent evaluation and confounding design choices like model scale and identifier design.

Zhen Zhang, Jujia Zhao, Xinyu Ma +4

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Search

Xinyu Ma

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)