Search papers, labs, and topics across Lattice.
Zhejiang University
2
0
4
LLMs often fail to maintain accurate beliefs in multi-turn interactions, but targeted reinforcement learning and representation steering can dramatically improve their contextual reasoning.
RL-trained LLM agents can get stuck in an "information self-locking" trap, failing to ask the right questions and internalize information, but a simple learning signal reallocation can break them out.