Mingzhou Xu

Hithink RoyalFlush Information Network

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (1)RLHF & Preference Learning (1)Tool Use & Agents (1)

Frequent co-authors

Xinyu Ma (1)Chang Jin (1)Qiang Wang (1)

Papers (1)

Apr 20, 2026

Apr 20, 2026·also Hithink RoyalFlush Information Network, UMacau

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.

Xinyu Ma, Mingzhou Xu, Chang Jin +1

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Search

Mingzhou Xu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)