Search papers, labs, and topics across Lattice.
East China Normal University
3
0
6
1
Skill0.5 achieves state-of-the-art out-of-distribution generalization in agentic RL by intelligently combining skill internalization and utilization, outperforming methods that rely solely on one or the other.
Decomposing GUI agent trajectories into verifiable milestones and auditing the evidence chain yields a 10% boost in RL training performance, outperforming single-judge reward systems.
Forget outdated benchmarks: LR-bench offers a fresh, 2024-2025-era dataset for reviewer assignment, and RATE leverages reviewer profiles for state-of-the-art matching.