Search papers, labs, and topics across Lattice.
1
0
2
5
LLMs that ace standard multiple choice tests can crumble when the option count explodes, revealing hidden weaknesses in semantic understanding and a surprising bias towards the first answer choices.