Search papers, labs, and topics across Lattice.
2
0
4
9
Standard RL critics for LLMs are basically useless, but these two simple methods can fix them.
RL-trained LLM agents can get stuck in an "information self-locking" trap, failing to ask the right questions and internalize information, but a simple learning signal reallocation can break them out.