Xianpei Han

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (7)Tool Use & Agents (5)World Models & Planning (4)Natural Language Processing (3)

Frequent co-authors

Hongyu Lin (15)Yaojie Lu (14)Le Sun (13)Boxi Cao (12)

Papers (15)

Jul 22, 2026

4d ago·also Baidu

DocOps: A Verifiable Benchmark for Autonomous Agents in Complex Document Operations

Even the most advanced autonomous agents struggle with maintaining document consistency, revealing critical failure modes that could hinder their effectiveness in real-world applications.

Jiazhen Jiang, Boxi Cao, Lingyong Yan +6

Eval Frameworks & Benchmarks Tool Use & Agents

Jul 14, 2026

1w ago

ShortOPD: Recovering Pruned LLMs with Short-to-Long On-Policy Distillation

ShortOPD boosts the generative performance of pruned LLMs by nearly 9 times while cutting training time by over 75%.

Qingyu Zhang, Qianhao Yuan, Hongyu Lin +8

Inference & Quantization Natural Language Processing

Jul 3, 2026

3w ago·also Fudan, ZJU

PraMem: Practice-derived Experiential Memory for Long-horizon Behavior Prediction

Transforming historical sequences into a powerful resource, PraMem significantly improves long-horizon behavior prediction beyond existing methods.

Zhuoqun Li, Boxi Cao, Jiawei Chen +10

Natural Language Processing World Models & Planning

Jun 22, 2026

ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models

ReasoningLens turns the opaque reasoning of large models into clear, actionable insights, enabling researchers to pinpoint errors and optimize performance like never before.

Jiasheng Zheng, Boxi Cao, Yaojie Lu +4

Reasoning & Chain-of-Thought

Jun 10, 2026

DAMOJun 10, 2026·also CAS, NJU

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Recursive composition of verifiable environments can boost reasoning performance in RL by up to 3.1 points while using only a fraction of the original environments.

Hao Xiang, Qiaoyu Tang, Le Yu +7

Reasoning & Chain-of-Thought Scalable Oversight & Alignment Theory World Models & Planning

Jun 3, 2026

Jun 3, 2026·also Ant Group

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

Current AI agents falter in autonomous development, revealing critical gaps in robustness and alignment as they struggle against human-engineered solutions.

Xinyu Lu, Pengbo Wang, Jun Zhou +5

Eval Frameworks & Benchmarks Tool Use & Agents

May 29, 2026

May 29, 2026·also CUHK

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

ADR transforms the landscape of code task generation, enabling LLMs to tackle genuinely novel and challenging coding problems that enhance their performance.

Jiasheng Zheng, Boxi Cao, Boxi Yu +6

Code Generation & Program Synthesis Data Curation & Synthetic Data Scalable Oversight & Alignment Theory

May 28, 2026

May 28, 2026·also iscas.ac.cn

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Forget scraping – this work shows you can generate high-quality, executable terminal environments from scratch to train language agents that outperform models trained on scraped data.

Xiaoxuan Peng, Kai Zhang, Kaiqi Zhang +6

Code Generation & Program Synthesis Data Curation & Synthetic Data Tool Use & Agents

May 25, 2026

May 25, 2026·also Tsinghua AI, BNRist, Department of Automation

MetaphorVU: Towards Metaphorical Video Understanding

MLLMs can't grasp metaphors in videos, revealing a surprising gap in their high-order cognitive abilities compared to humans.

Zhuoqun Li, Boxi Cao, Guiping Jiang +9

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Apr 30, 2026

Apr 30, 2026·also CUHK

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

LLMs trained with ScaleBox, a new high-fidelity code verification system, substantially outperform those trained with heuristic matching, suggesting current RLHF methods are bottlenecked by verification quality.

Xin Zheng, Boxi Cao, Pengbo Wang +7

Code Generation & Program Synthesis Distributed Systems & Hardware Eval Frameworks & Benchmarks

Apr 22, 2026

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Multilingual RAG systems are systematically suppressing "answer-critical" documents in non-English languages, crippling their ability to leverage global knowledge.

Guozhao Mo, Yafei Shi, Boxi Cao +6

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Apr 18, 2026

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Forget text-dominance: Today's Omni-modal LLMs surprisingly favor visual inputs, creating new challenges for cross-modal reasoning.

Xinru Yan, Boxi Cao, Yao Lu +5

Eval Frameworks & Benchmarks Multimodal Models

Apr 9, 2026

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

LLMs exhibit a "Utopian bias" when simulating human behavior, converging towards an unrealistic "positive average person" and failing to capture individual differences and long-tail behaviors.

Jiawei Chen, Ruoxi Xu, Boxi Cao +12

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Mar 10, 2026

CMU MLMar 10, 2026·also CAS

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

LLMs trained with reinforcement learning from verifiable rewards (RLVR) become overconfident in incorrect answers, but a simple fix—decoupling reasoning and calibration objectives—can restore proper calibration without sacrificing accuracy.

Zheng Ma, Zhengzhao Ma, Xueru Wen +7

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought RLHF & Preference Learning

Feb 26, 2026

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

By grounding reflection in the visual artifacts of presentation slides, DeepPresenter enables agents to iteratively refine presentations in a way that internal reasoning traces alone cannot.

Haolin Zheng, Guozhao Mo, Xinru Yan +9

Multimodal Models Tool Use & Agents World Models & Planning

Search

Xianpei Han

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (15)