Ankan Saha

Papers on Lattice

Total citations

Topics

h-index

Research focus

Constitutional AI & AI Ethics (1)RLHF & Preference Learning (1)

Frequent co-authors

Aman Gupta (2)Shao Tang (2)Qingquan Song (2)Sirou Zhu (2)

Papers (2)

2025

Aman Gupta +112025

AlphaPO - Reward shape matters for LLM alignment

Al-phaPO is introduced, a new DAA method that leverages an α -parameter to help change the shape of the reward function beyond the standard log reward, and helps maintain fine-grained control over likelihood displacement and over-optimization.

Aman Gupta, Shao Tang, Qingquan Song +9

Jan 7, 2025

Aman Gupta +12Jan 7, 2025

AlphaPO: Reward Shape Matters for LLM Alignment

AlphaPO unlocks 7-50% better LLM alignment by showing that reward function *shape* is a surprisingly powerful lever in Direct Alignment Algorithms.

Aman Gupta, Shao Tang, Qingquan Song +1011

Constitutional AI & AI Ethics RLHF & Preference Learning

Search

Ankan Saha

Research focus

Frequent co-authors

Papers (2)