Zhiling Yan

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (3)Tool Use & Agents (2)Scalable Oversight & Alignment Theory (1)Multimodal Models (1)

Frequent co-authors

Lichao Sun (3)Dingjie Song (1)Hanrong Zhang (1)Yutong Dai (1)

Papers (4)

Jun 4, 2026

Zhiling Yan +7Jun 4, 2026

OpenSkill: Open-World Self-Evolution for LLM Agents

OpenSkill enables LLM agents to autonomously evolve their skills and verification mechanisms in open-world settings, achieving superior performance without any target-task supervision.

Zhiling Yan, Dingjie Song, Hanrong Zhang +5

Scalable Oversight & Alignment Theory Tool Use & Agents

Jun 3, 2026

Tsinghua AIJun 3, 2026·also BAIR, Department of Computer Science, Georgia Tech, KU Leuven +7

Agents'Last Exam

The hardest AI tasks remain largely unsolved, with current models achieving only a 2.6% success rate on economically valuable workflows.

Website GitHub, HuggingFace Leaderboard, Yiyou Sun +306

Eval Frameworks & Benchmarks Tool Use & Agents

Mar 26, 2026

Dingjie Song +8Mar 26, 2026·also Corresponding author

Can MLLMs Read Students'Minds? Unpacking Multimodal Error Analysis in Handwritten Math

Despite their visual reasoning prowess, today's MLLMs still struggle to understand handwritten math scratchwork, falling far short of human expert performance in diagnosing student errors.

Dingjie Song, Tianlong Xu, Yifei Zhang +6

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Feb 10, 2026

Feb 10, 2026·also HKUST, Imperial, Lehigh

LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation

LLMs in medicine may be dangerously overhyped: even the best models achieve only 39% accuracy on a contamination-free, real-world clinical benchmark, with performance tanking on newer cases.

Zhiling Yan, Zhe Fang, Yisheng Ji +3

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

Search

Zhiling Yan

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)