Search papers, labs, and topics across Lattice.
University at Buffalo
1
0
3
0
LLM benchmark accuracy jumps 10% when evaluated on a cleaned-up version of Humanity's Last Exam, highlighting the significant impact of dataset noise on performance metrics.