Search papers, labs, and topics across Lattice.
1
0
2
LLMs aren't equally reliable as NLG evaluators, but a Bradley-Terry extension called BT-sigma can learn judge reliability from pairwise comparisons alone, improving ranking accuracy without human supervision.