Search papers, labs, and topics across Lattice.
1
0
3
RLHF's reliance on high-quality data can be significantly improved by token-level optimization, active learning, data augmentation, and multimodal feedback.