Search papers, labs, and topics across Lattice.
2
0
5
2
Test-time training can finally scale for large reasoning models: TEMPO unlocks sustained performance gains by interleaving policy refinement with periodic critic recalibration, boosting accuracy by over 18% on challenging benchmarks.
LLMs can be taught to "think longer" and explore more diverse reasoning paths in-context via a simple length-incentivized reward, leading to improved generalization.