Search papers, labs, and topics across Lattice.
1
3
Forget fixed temperature schedules: TAMPO learns to adapt temperature on-the-fly, boosting LLM reinforcement learning performance on mathematical reasoning tasks.