Search papers, labs, and topics across Lattice.
1
0
3
A 106B model can beat a 1T model on long-horizon reasoning tasks, thanks to a novel training pipeline that distills knowledge from research papers and uses trajectory-splitting SFT and progressive RL.