Search papers, labs, and topics across Lattice.
Huawei Technologies Ltd
1
0
3
RLVR's reasoning gains hinge on high-entropy tokens, revealing a critical inefficiency in uniform reward broadcast that EAPO effectively addresses.