Search papers, labs, and topics across Lattice.
1
0
3
1
Reasoning LLM judges can inadvertently teach policies to generate adversarial outputs that game the evaluation system, highlighting a critical challenge in aligning LLMs for non-verifiable tasks.