Jason Zhu

Papers on Lattice

Total citations

Topics

h-index

Frequent co-authors

Aman Gupta (1)Shao Tang (1)Qingquan Song (1)Sirou Zhu (1)Jiwoo Hong (1)

Papers (1)

2025

Aman Gupta +112025

AlphaPO - Reward shape matters for LLM alignment

Al-phaPO is introduced, a new DAA method that leverages an α -parameter to help change the shape of the reward function beyond the standard log reward, and helps maintain fine-grained control over likelihood displacement and over-optimization.

Aman Gupta, Shao Tang, Qingquan Song +9

Search

Jason Zhu

Frequent co-authors

Papers (1)