Search papers, labs, and topics across Lattice.
2
0
3
LLM benchmark accuracy jumps 10% when evaluated on a cleaned-up version of Humanity's Last Exam, highlighting the significant impact of dataset noise on performance metrics.
LLMs writing Chinese poetry often nail the structure but botch the rhythm, and even when they get both right, the result can still be artistically bankrupt.