Search papers, labs, and topics across Lattice.
Southeast University, Nanyang Technological University
2
0
4
Leveraging hidden states from reward models can boost RLHF performance by over 6% on challenging benchmarks, transforming how we utilize reward signals.
Merging concrete visual rollouts with abstract reasoning leads to a 10.6% and 10.9% performance boost on challenging reasoning benchmarks, showcasing the power of hybrid models.