Search papers, labs, and topics across Lattice.
2
0
6
4
On-policy RL for machine learning engineering agents is now practical, thanks to a synthetic sandbox that slashes execution time by 13x while boosting performance by up to 67%.
Achieve significant reasoning gains in frozen LLMs (+22.4%) without retraining by adaptively routing reward model guidance at the token level during inference.