Search papers, labs, and topics across Lattice.
1
0
3
1
Stop hand-crafting RLHF curricula: ACTOR-CURATOR learns to dynamically select training problems, boosting performance by up to 30% and speeding up training by 80% on challenging reasoning tasks.