Derek F. Wong

University of Macau

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (1)RLHF & Preference Learning (1)Tool Use & Agents (1)

Frequent co-authors

Xinyu Ma (1)Mingzhou Xu (1)Xuebo Liu (1)Chang Jin (1)

Papers (1)

Apr 20, 2026

6d ago·also Hithink RoyalFlush Information Network, UMacau

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.

Xinyu Ma, Mingzhou Xu, Xuebo Liu +3

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Search

Derek F. Wong

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)