Mingli Song

B-Ins), and VLMs with a thinking mode outperform those with a non-thinking mode (e.g., Kimi-VL-A, V-, K GeoQA [12] I ✗ ✓ ✗ MC ✗ ✗ ✗ 5., This research is supported by the RIE2025 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) (Award I2301E0026), administered by A*STAR, as well as supported by Alibaba Group and NTU Singapore through Alibaba-NTU Global e-Sustainability CorpLab (ANGEL). (Corresponding author: Dacheng Tao.)Shunyu Liu, Junjie Zhang, Rongcheng Tu and Dacheng Tao are with Nanyang Technological University, Singapore (e-mail: shunyu.liu@ntu.edu.sg; junjie.zhang@ntu.edu.sg; turongcheng@gmail.com; dacheng.tao@ntu.edu.sg).Wenkai Fang, Yang Zhou, Kongcheng Zhang, and Mingli Song are with the College of Computer Science and Technology, Zhejiang University, China (e-mail: wenkfang@zju.edu.cn; imzhouyang@zju.edu.cn; zhangkc@zju.edu.cn; brooksong@zju.edu.cn).Zetian Hu is with the School of Aerospace Engineering, Tsinghua University, China (e-mail: huzt22@mails.tsinghua.edu.cn).Ting-En Lin, Fei Huang, and Yongbin Li are with the Tongyi Lab, Alibaba Group, China (e-mail: ting-en.lte@alibaba-inc.com; f.huang@alibaba-inc.com; shuide.lyb@alibaba-inc.com)

Tsinghua AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

Constitutional AI & AI Ethics (1)RLHF & Preference Learning (1)Training Efficiency & Optimization (1)

Frequent co-authors

Shunyu Liu (1)Wenkai Fang (1)Zetian Hu (1)Junjie Zhang (1)

Papers (1)

Mar 12, 2025

Tsinghua AIMar 12, 2025·also B-Ins)

A Survey of Direct Preference Optimization

DPO's rise as a computationally efficient alternative to RLHF for LLM alignment has spurred a diverse range of research, now systematically organized and analyzed in this comprehensive survey.

Shunyu Liu, Wenkai Fang, Zetian Hu +925

Constitutional AI & AI Ethics RLHF & Preference Learning Training Efficiency & Optimization

Search

Mingli Song

Research focus

Frequent co-authors

Papers (1)