Search papers, labs, and topics across Lattice.
University of Macau
1
0
3
LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.