Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou)
2
0
5
13
Standard RL rollouts can effectively provide world modeling supervision, leading to significant performance gains in language agents.
Instruction-following in large reasoning models gets a serious upgrade with RAIN-Merging, a gradient-free technique that merges in instruction-tuned capabilities without wrecking the model's ability to think step-by-step.