Search papers, labs, and topics across Lattice.
1
0
2
0
Forget hand-engineered reward functions: this method uses language models to learn factorized world states that generalize to new goals and environments, outperforming LLM-as-a-Judge in zero-shot reward prediction.