Search papers, labs, and topics across Lattice.
Tencent AI Lab, University of Notre Dame
1
0
3
RL fine-tuning can *hurt* reasoning performance when your base LLM is already too good, unless you force it to explore more diverse solutions.