Search papers, labs, and topics across Lattice.
1
0
3
LLMs can escape the trap of converging on popular but incorrect answers in unsupervised RLVR by temporarily "unlearning" and exploring diverse response options.