Search papers, labs, and topics across Lattice.
1
0
2
2
Stuck training your reasoning model with RLVR due to a low initial success rate? This paper shows how a Tsallis q-logarithm loss can jumpstart learning by adaptively amplifying gradients, achieving a +14.4 point boost over GRPO on HotPotQA.