Search papers, labs, and topics across Lattice.
Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
1
0
2
Theory-driven evaluation of reasoning traces can achieve 2.5x better correlation with human judgments than existing methods, offering a more reliable way to assess reasoning quality in language models.