Search papers, labs, and topics across Lattice.
2
0
6
Reward hacking isn't just about incentives, it's about wild directional swings in your model's parameter space – and constraining those swings can keep your LM on the straight and narrow.
Intelligent budget management in LLM agents can outperform brute-force compute scaling by 4x, thanks to a new search algorithm that prunes redundant steps and focuses on promising trajectories.