Search papers, labs, and topics across Lattice.
University at Buffalo
3
0
8
LLMs still struggle with PhD-level scanning probe microscopy tasks, but SPM-Bench offers a new automated pipeline to generate challenging scientific benchmarks and quantify model "personalities" like "Conservative" or "Gambler."
Stop hand-rolling your multi-task learning to rank models: DeepMTL2R provides a ready-to-use framework with 21 SOTA algorithms and Pareto-optimal optimization.
LLM benchmark accuracy jumps 10% when evaluated on a cleaned-up version of Humanity's Last Exam, highlighting the significant impact of dataset noise on performance metrics.