Pengfei Liu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (4)Scientific Discovery & Drug Design (3)Eval Frameworks & Benchmarks (3)Computer Vision (2)

Frequent co-authors

Tiantian Mi (3)Yixiu Liu (2)Yiwei Qin (2)Weiye Si (2)

Papers (8)

Mar 31, 2026

Weixia Xu +10Mar 31, 2026

ASI-Evolve: AI Accelerates AI

AI can now design better AI: ASI-Evolve discovers SOTA architectures, curates pretraining data, and designs RL algorithms, outperforming human-designed baselines by significant margins.

Weixia Xu, Weixian Xu, Tiantian Mi +8

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design Tool Use & Agents

Mar 30, 2026

Wenhan Wang +9Mar 30, 2026

CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

A 7B model trained on a new dataset of Chinese porcelain outperforms GPT-4 by 12% on expert connoisseurship tasks, demonstrating the power of domain-specific training and tool integration.

Wenhan Wang, Zhixiang Zhou, Zhongtian Ma +7

Computer Vision Multimodal Models Tool Use & Agents

Mar 29, 2026

Shi Qiu +60Mar 29, 2026·also BUPT, QFNU, USTC

PRBench: End-to-end Paper Reproduction in Physics Research

LLMs can't even reproduce published physics papers end-to-end, with the best model scoring only 34% on a new benchmark designed for this purpose.

Shi Qiu, Junyi Deng, Jun Deng +58

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

Mar 28, 2026

Yiwei Qin +14Mar 28, 2026

daVinci-LLM:Towards the Science of Pretraining

Pretraining isn't just about scaling data volume; daVinci-LLM's ablations reveal that data processing depth, domain-specific strategies, and compositional balance are equally critical for unlocking LLM capabilities.

Yiwei Qin, Yixiu Liu, Tiantian Mi +12

Open-Source Models & Weights Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 18, 2026

Fei Zhang +6Mar 18, 2026

TransText: Transparency Aware Image-to-Video Typography Animation

Achieve high-fidelity transparent text animations from image-to-video models without retraining the VAE, sidestepping data scarcity and latent pattern mixing issues.

Fei Zhang, Bohao Tang, Soubhik Sanyal +4

Computer Vision Multimodal Models

Mar 13, 2026

Mar 13, 2026·also Northeastern

daVinci-Env: Open SWE Environment Synthesis at Scale

Forget toy datasets: OpenSWE delivers 45K+ real-world, executable Python environments for leveling up your SWE agent, and it's all open-sourced.

Dayuan Fu, Shenyu Wu, Yunze Wu +7

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Feb 15, 2026

Tsinghua AIFeb 15, 2026·also (Corresponding author: Rui Meng and Xiaodong, Huawei, Nankai University, PKU +3

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Today's best AI agents fail at realistic software engineering tasks, stalling before even reaching 30% completion, revealing the urgent need for better long-horizon planning and human-AI collaboration.

Yukang Feng, Jian Sun, Jianwen Sun +26

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Feb 8, 2026

Yiwei Qin +7Feb 8, 2026·also Columbia, Manuscript received February 24

Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training

Frontier LLMs can unlock substantial performance gains in scientific domains by refining and completing raw scientific text, leading to a +8.40 point improvement on domain-aligned tasks.

Yiwei Qin, Zhen Huang, Tiantian Mi +5

Data Curation & Synthetic Data Natural Language Processing Scientific Discovery & Drug Design

Search

Pengfei Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)