Jiaheng Liu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (7)Eval Frameworks & Benchmarks (7)Tool Use & Agents (4)Code Generation & Program Synthesis (2)

Frequent co-authors

Ziteng Feng (2)Zhiqi Bai (2)He Zhu (2)Qianqian Xie (2)

Papers (9)

Jul 11, 2026

Jiayi Tian +292w ago·also HIT

ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory

A general Agent OS can boost long-horizon robotic execution and enable continual learning through structured memory management and self-evolution.

Jiayi Tian, Shiao Liu, Yuting Xu +27

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Jun 9, 2026

P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

MLLMs can generate 3D models, but they often miss the mark on precise geometry and coherent assemblies, revealing significant limitations in their structural reasoning abilities.

Yikang Yang, Zhanpeng Hu, Youtian Lin +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Jun 7, 2026

Jun 7, 2026·also Kuaishou

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Current video editing models falter under the weight of complex user instructions, often omitting critical edits and introducing artifacts.

Jiangtao Wu, Jiaming Wang, Yiwen He +5

Eval Frameworks & Benchmarks Multimodal Models

Jun 2, 2026

OmniHalluc-L: Counterfactual Benchmarking and Modality-Perturbation Reliability Calibration for Long-Form Omni Hallucination

Open-weight Omni models struggle with binding accuracy, achieving only 41.55% on a new counterfactual benchmark, highlighting a critical gap in long-video comprehension.

Zixuan Dong, Jiafu Tang, Zhide Lei +9

Eval Frameworks & Benchmarks Multimodal Models

Jun 1, 2026

Jun 1, 2026·also PKU, Zhongguancun Laboratory

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

TVIR-Agent reveals that integrating visual elements into report generation can dramatically improve the quality and reliability of analytical outputs.

Xinkai Ma, Zhiqi Bai, Dingling Zhang +21

Eval Frameworks & Benchmarks Multimodal Models

Xinyu Che +11Jun 1, 2026·also Kling Team

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Converting noisy, human-centric guides into self-evolving agent skills can yield performance improvements of up to 25.3 percentage points across diverse tasks.

Xinyu Che, Junqi Xiong, Yunfei Ge +9

Multimodal Models Tool Use & Agents

Jiaming Wang +11Jun 1, 2026·also JIUTIAN Research

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Span-level error localization can boost deep-research agent reliability by up to 30 percentage points, revealing critical insights into where agents go wrong.

Jiaming Wang, Ziteng Feng, Jiangtao Wu +9

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 23, 2026

Apr 23, 2026·also Anhui Province Key Laboratory of Digital

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

LLM agent distillation leads to surprisingly high rates of behavioral mimicry, with some student models exhibiting tool-use habits *more* similar to their teachers than the teacher's own family members.

Chen Yang, Yuning Zhang, Zhoufutu Wen +4

Eval Frameworks & Benchmarks Inference & Quantization Tool Use & Agents

Apr 20, 2026

Xinyu Che +12Apr 20, 2026

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Evaluating web coding LLMs with real-world fidelity reveals that even state-of-the-art models still struggle with aesthetics and framework-specific nuances.

Xinyu Che, Chenchen Zhang, Yukai Huang +10

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Search

Jiaheng Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)