Search papers, labs, and topics across Lattice.
2
0
5
LLMs trained with Vector Policy Optimization (VPO) learn to produce diverse solutions that unlock previously unsolvable problems in evolutionary search, outperforming models optimized for single scalar rewards.
Forget brute-force scaling: crafting the *right* context from past experiences unlocks surprisingly large gains in LLM agent performance.