Goran Radanovic

Papers on Lattice

Total citations

Topics

h-index

Research focus

Red-Teaming & Adversarial Robustness (2)RLHF & Preference Learning (2)Data Curation & Synthetic Data (1)

Frequent co-authors

Debmalya Mandal (2)Andi Nika (1)Jonathan Nöther (1)Parameswaran Kamalaruban (1)

Papers (2)

Mar 13, 2025

Andi Nika +5Mar 13, 2025

Policy Teaching via Data Poisoning in Learning from Human Preferences

RLHF and DPO are surprisingly vulnerable to data poisoning, with even a small number of carefully crafted preferences capable of steering the learned policy towards a desired (potentially harmful) target.

Andi Nika, Jonathan Nöther, Debmalya Mandal +3

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Mar 1, 2025

Debmalya Mandal +2Mar 1, 2025

Distributionally Robust Reinforcement Learning with Human Feedback

RLHF models can be made significantly more robust to distribution shift by incorporating distributionally robust optimization into both reward modeling and policy optimization.

Debmalya Mandal, Paulius Sasnauskas, Goran Radanovic6

Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Search

Goran Radanovic

Research focus

Frequent co-authors

Papers (2)