Search papers, labs, and topics across Lattice.
Hithink RoyalFlush Information Network
1
0
3
LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.