Search papers, labs, and topics across Lattice.
BITS Pilani, India
2
0
4
0
LLM judges can be subtly manipulated by framing the consequences of their decisions, leading to biased evaluations even when the content being judged remains constant.
LLM judges are far less reliable on individual examples than aggregate metrics suggest: up to 67% of documents show judgment inconsistencies, and some criteria like fluency are essentially unjudgeable.