Search papers, labs, and topics across Lattice.
Meituan, Peking University {fuxiaoliang04, linjiaye, fangyangyi}@meituan.com
1
0
3
LLM reasoning gets a serious upgrade with MASPO, a new RLVR method that smartly balances gradient use, probability mass, and signal reliability for faster, more robust learning.