Search papers, labs, and topics across Lattice.
1
0
3
1
Instead of imitating reflections, LLM agents can be trained to reason about action quality by rewarding correct judgments between alternative actions, leading to improved performance and generalization.