Search papers, labs, and topics across Lattice.
3
0
8
33
Test-time training can finally scale for large reasoning models: TEMPO unlocks sustained performance gains by interleaving policy refinement with periodic critic recalibration, boosting accuracy by over 18% on challenging benchmarks.
A new 32B code LLM trained specifically for industrial tasks crushes existing models on specialized domains like chip design and GPU kernel optimization, while remaining competitive on general coding benchmarks.
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.