Search papers, labs, and topics across Lattice.
1
0
3
Don't trust the benchmark: Seemingly equivalent LLMs can disagree wildly on individual examples, leading to irreproducible scientific results when used for annotation.