Search papers, labs, and topics across Lattice.
1
0
3
LLMs' performance on False Belief Tests isn't just about size – it's profoundly skewed by how you phrase the question, revealing that models learn stereotypical responses to mental-state vocabulary during pre-training.