Search papers, labs, and topics across Lattice.
Improbable AI Lab
1
0
3
LLMs trained with Vector Policy Optimization (VPO) learn to produce diverse solutions that unlock previously unsolvable problems in evolutionary search, outperforming models optimized for single scalar rewards.