Search papers, labs, and topics across Lattice.
1
2
3
11
Combining SFT and DPO boosts OPT-350M's safety and helpfulness more than either method alone, but noisy data and limited resources still pose alignment challenges.