Search papers, labs, and topics across Lattice.
Department of Computer Science and Technology, Tsinghua University
1
0
2
Current LLM judges show a troubling reliability gap in long-form evaluations, raising questions about their effectiveness in real-world applications.