Search papers, labs, and topics across Lattice.
Sword Health, Instituto de Telecomunicac ¸ões, Universidade de Lisboa
1
0
3
Even with objective, programmatically verifiable rubrics, LLM judges are 50% more likely to incorrectly favor their own outputs, revealing a persistent self-preference bias that skews LLM evaluations.