Search papers, labs, and topics across Lattice.
University of Science and Technology of China, Meituan, Beijing, China
2
0
4
0
Stop uniformly distilling your LLMs: SCOPE selectively amplifies teacher guidance on incorrect trajectories and reinforces student uncertainty on correct ones, leading to significant gains in reasoning performance.
LLM reasoning gets a serious upgrade with MASPO, a new RLVR method that smartly balances gradient use, probability mass, and signal reliability for faster, more robust learning.