Search papers, labs, and topics across Lattice.
Harbin Institute of Technology
2
0
4
LLMs can learn to explore beyond their initial latent space and achieve substantial gains in mathematical reasoning by unifying offline teacher guidance and online reinforcement learning with a specialized reward modeling lens.
Multi-agent systems get a 6.3% accuracy boost on math problems thanks to a new "rectify-or-reject" pruning method that dynamically filters out bad information at test time.