Search papers, labs, and topics across Lattice.
University of Central Florida
2
2
4
8
Compressing multi-dimensional human preferences into single binary labels severely degrades DPO training, but a semi-supervised approach can recover state-of-the-art performance without additional human annotation.
Scaling visual preference optimization hinges on data quality, as a massive, high-resolution dataset renders complex optimization algorithms unnecessary.