Search papers, labs, and topics across Lattice.
3
0
6
0
Subtle wording changes in benchmark rubrics can swing model performance by over 13%, revealing a hidden subjectivity in "objective" gold labels.
LLM-judged investment rationales reward verbosity and confidence over actual financial insight, penalizing concise, correct reasoning by nearly 3 points.
Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.