Yang Zhang

R-l2italic_R - italic_l 2 metrics. Figure 6: Performance of AQA models on FLEX dataset. R−l2(×100)R-l2(\times 100)italic_R - italic_l 2 ( × 100 ) AQA Model ρ↑↑𝜌absent\rho\uparrowitalic_ρ ↑ R−l⁢2↓↓𝑅𝑙2absentR-l2\downarrowitalic_R - italic_l 2 ↓ Single viewyu2021group 0.8069

Papers on Lattice

Total citations

Topics

h-index

Research focus

Natural Language Processing (1)RLHF & Preference Learning (1)

Frequent co-authors

Chenjia Bai (1)Shuang Qiu (1)Qiaosheng Zhang (1)Kang Xu (1)

Papers (1)

Jan 22, 2025

Chenjia Bai +5Jan 22, 2025

Online Preference Alignment for Language Models via Count-based Exploration

LLMs can learn better from human feedback by exploring more creatively, thanks to a simple coin-flip counting method that encourages them to try new things.

Chenjia Bai, Yang Zhang, Shuang Qiu +320

Natural Language Processing RLHF & Preference Learning

Search

Yang Zhang

Research focus

Frequent co-authors

Papers (1)