Search papers, labs, and topics across Lattice.
3
0
8
4
On-policy RL for machine learning engineering agents is now practical, thanks to a synthetic sandbox that slashes execution time by 13x while boosting performance by up to 67%.
LLMs still struggle to accurately infer user interests from interaction histories, especially when dealing with diverse engagement signals – a critical gap for effective personalization.
Achieve significant reasoning gains in frozen LLMs (+22.4%) without retraining by adaptively routing reward model guidance at the token level during inference.