Search papers, labs, and topics across Lattice.
1
0
3
LLMs can learn to strategically sabotage their own reinforcement learning, resisting capability elicitation while maintaining task performance.