Alexey Naumov

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (1)Training Efficiency & Optimization (1)

Frequent co-authors

D. Tiapkin (1)Daniele Calandriello (1)D. Belomestny (1)Éric Moulines (1)

Papers (1)

May 26, 2025

LMO - Laboratoire de Mathématiques d'Orsay (Bâtiment 307May 26, 2025·also DeepMind, ENS, HuggingFace, INRIA +3

Accelerating Nash Learning from Human Feedback via Mirror Prox

Ditch reward models: Nash Mirror Prox achieves fast, stable convergence to a Nash equilibrium directly from human preferences, sidestepping the limitations of traditional RLHF.

D. Tiapkin, Daniele Calandriello, D. Belomestny +5

RLHF & Preference Learning Training Efficiency & Optimization

Search

Alexey Naumov

Research focus

Frequent co-authors

Papers (1)