Search papers, labs, and topics across Lattice.
2
0
5
2
Reasoning LLM judges can inadvertently teach policies to generate adversarial outputs that game the evaluation system, highlighting a critical challenge in aligning LLMs for non-verifiable tasks.
Training on SciMDR, a new 300K QA dataset synthesized from scientific papers, substantially boosts model performance on complex, document-level scientific reasoning tasks.