Search papers, labs, and topics across Lattice.
New York University
1
0
2
LLMs struggle to accurately evaluate real human reasoning in mathematics, revealing a critical evaluation gap that challenges current assessment methods.