Search papers, labs, and topics across Lattice.
2
0
6
19
Instead of imitating reflections, LLM agents can be trained to reason about action quality by rewarding correct judgments between alternative actions, leading to improved performance and generalization.
Optimizing LLMs for generating multiple attempts (pass@k) can actually *hurt* their ability to get it right on the first try (pass@1) due to subtle prompt interference effects.