Jingyuan Zhang

Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.

Lei Yu, Jingyuan Zhang, Xin Wang +4

Code Generation & Program Synthesis Natural Language Processing RLHF & Preference Learning

Meta AIMar 19, 2026·also CMU ML, CAS, UESTC, UNC +1

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim +20

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Jingyuan Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)