Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou, Huawei Technologies Ltd
1
0
3
RLVR's reasoning gains hinge on high-entropy tokens, revealing a critical inefficiency in uniform reward broadcast that EAPO effectively addresses.