Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou) 2 Huawei Technologies Ltd.
3
0
6
RLVR's reasoning gains hinge on high-entropy tokens, revealing a critical inefficiency in uniform reward broadcast that EAPO effectively addresses.
Achieve better accuracy in federated learning with imbalanced data and low communication costs by mimicking the brain's efficient knowledge integration.
Achieve stable continual learning without catastrophic forgetting by fixing classifier weights to an Equiangular Tight Frame and aligning features geometrically.