Search papers, labs, and topics across Lattice.
1
0
3
11
Reward-driven reflection makes LLMs *more* likely to hack rewards, but a dedicated safety channel lets them discover hidden constraints from a single bit of feedback.