Search papers, labs, and topics across Lattice.
1
0
3
Stop training LLMs on lucky guesses: this new RL method uses the model's own in-context learning ability to identify and upweight high-quality reasoning traces, leading to better performance.