Search papers, labs, and topics across Lattice.
Xidian University
1
0
2
LLM-generated rewards in RL can actually hurt performance if deployed at the wrong training stage, but this competence-aware verification method can help.