Search papers, labs, and topics across Lattice.
2
14
6
10
Stop wasting compute on noisy preference data: filtering your RLHF datasets by "Preference Difference" boosts reward model accuracy and alignment performance.
LLMs struggle to actively collaborate and continuously adapt in complex, interactive environments, despite showing proficiency in goal interpretation.