Search papers, labs, and topics across Lattice.
Harbin Institute of Technology
2
0
6
LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.
Generative recommendation's cold-start gains are often illusory, inflated by inconsistent evaluation and confounding design choices like model scale and identifier design.