Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou) 2 Huawei Technologies Ltd.
3
0
7
RLVR's reasoning gains hinge on high-entropy tokens, revealing a critical inefficiency in uniform reward broadcast that EAPO effectively addresses.
Navigate sprawling, multi-floor environments without drowning in grid maps: osmAG-Nav slashes planning latency by up to 7816x using a hierarchical semantic approach.
Achieve up to 102% Sharpe Ratio improvement and 17.5% directional accuracy gain by unifying event-centric data construction and decision-oriented fine-tuning with a hierarchical gated reward model.