Search papers, labs, and topics across Lattice.
University of Queensland, Michigan State University
2
0
3
LLMs aren't ready to replace human judges in relevance assessment, as they consistently inflate relevance scores and are easily swayed by superficial cues like passage length.
LLMs may ace the test, but their uncertainty estimates are far from perfect, raising serious concerns about their reliability in high-stakes educational assessments.