Minjun Zhu

Papers on Lattice

Total citations

Topics

h-index

Research focus

Reasoning & Chain-of-Thought (1)RLHF & Preference Learning (1)Training Efficiency & Optimization (1)

Frequent co-authors

Haocheng Lu (1)Henry Yu (1)

Papers (1)

Dec 17, 2025

Haocheng Lu +2Dec 17, 2025

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

Forget expensive reward models: this work shows how a compact MathVerifier can guide DPO to significantly improve mathematical reasoning in small language models by mining hard negatives and weighting preference pairs.

Haocheng Lu, Minjun Zhu, Henry Yu

Reasoning & Chain-of-Thought RLHF & Preference Learning Training Efficiency & Optimization

Search

Minjun Zhu

Research focus

Frequent co-authors

Papers (1)