Search papers, labs, and topics across Lattice.
1
0
2
5
Even the strongest LLM judges can be easily fooled by seemingly high-quality reasoning chains, highlighting a critical vulnerability in using LLMs to evaluate other LLMs.