Search papers, labs, and topics across Lattice.
2
0
4
30
Using a top or bottom-performing LLM as an anchor in "LLM-as-a-judge" benchmarks can dramatically skew results, making the choice of a mediocre anchor key to reliable evaluation.
BabyLM 2026 seeks to push the boundaries of data-efficient and cognitively plausible language models, now with a multilingual twist.