Search papers, labs, and topics across Lattice.
1
0
2
CPPO redefines token-level trust regions in LLM reinforcement learning, leading to substantial gains in reasoning accuracy and training stability.