Search papers, labs, and topics across Lattice.
1
0
3
10
LLM agents can learn to solve complex, long-horizon tasks much more effectively by using themselves as post-hoc critics to refine Q-values through hindsight reasoning.