Search papers, labs, and topics across Lattice.
1
0
3
7
LLM-as-a-judge consensus is often an illusion: models agree on surface-level features, but diverge wildly when evaluating true quality, a problem fixable by injecting domain knowledge into rubrics.