Search papers, labs, and topics across Lattice.
Zhejiang University
1
0
3
3
Instead of imitating reflections, LLM agents can be trained to reason about action quality by rewarding correct judgments between alternative actions, leading to improved performance and generalization.