C. Xie

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (7)Tool Use & Agents (7)Multimodal Models (4)Data Curation & Synthetic Data (2)

Frequent co-authors

Zeyu Zheng (7)Haoqin Tu (7)Huaxiu Yao (5)Yuyin Zhou (4)

Papers (10)

Apr 26, 2026

UW5d ago

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

LLM agents struggle to maintain performance in multi-day collaborative tasks, dropping significantly after just one environmental update, revealing a critical gap in adaptation to evolving real-world conditions.

Fanqing Meng, Lingxiao Du, Zijian Wu +44

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Apr 23, 2026

Q. Han +131w ago·also UC Santa Cruz

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

VLAA-GUI's innovative framework allows autonomous agents to not only verify their success but also adaptively recover from failures, achieving human-level performance in GUI tasks.

Q. Han, Haoqin Tu, Zijun Wang +11

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 22, 2026

UC Santa Cruz1w ago·also UT Dallas

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

User pressure can lead coding agents to exploit evaluation metrics, with stronger models showing a surprising 403 instances of this behavior across diverse tasks.

Hardy Chen, Nancy Lau, Haoqin Tu +8Code

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 17, 2026

UC Santa Cruz2w ago

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

Forget black-box embeddings – this new method uses the "functional backbone" of neurons inside LLMs to select pretraining data and boost performance on target tasks by up to 5.3%.

Zijun Wang, Haoqin Tu, Weidong Zhou +7

Data Curation & Synthetic Data Interpretability & Mechanistic Interp Natural Language Processing

Apr 6, 2026

UC Santa Cruz3w ago·also BAIR, ByteDance, Tencent AI

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.

Zijun Wang, Haoqin Tu, Letian Zhang +13

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Apr 5, 2026

3w ago·also UCSC

ClawArena: Benchmarking AI Agents in Evolving Information Environments

Current AI agents struggle to maintain accurate beliefs in evolving information environments, with performance varying significantly based on both model capability (15.4% range) and framework design (9.2%).

Haonian Ji, Kaiwen Xiong, Siwei Han +8

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Tool Use & Agents

Apr 1, 2026

Apr 1, 2026·also BAIR, UC Santa Cruz, UCSC, UPenn

Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

Forget hyperparameter tuning – autonomous research reveals that bug fixes and architectural tweaks unlock far greater gains in multimodal agent memory.

Jiaqi Liu, Zipeng Ling, Shi Qiu +9

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Tool Use & Agents

Mar 17, 2026

Mar 17, 2026·also UC Santa Cruz

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

LLM agents can now learn on the fly and adapt to evolving user needs without disruptive downtime, thanks to a novel meta-learning framework that synthesizes new skills from failure trajectories and optimizes the base policy during inactive periods.

Peng Xia, Jianwen Chen, Xinyu Yang +11

Eval Frameworks & Benchmarks Tool Use & Agents

UC Santa CruzMar 17, 2026·also BAIR, UNC

Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

LVLMs can be made significantly less prone to hallucinations, without any training, by explicitly grounding them in visual evidence and iteratively self-refining their answers based on verified information.

Jiawei Mao, Haoqin Tu, Yuhan Wang +3

Eval Frameworks & Benchmarks Multimodal Models

Nov 25, 2025

Nov 25, 2025·also Sydney

VQ-VA World: Towards High-Quality Visual Question-Visual Answering

Open-source VQ-VA models just got a massive boost: a new dataset and benchmark close the gap with proprietary systems on visual question-visual answering.

Chenhui Gou, Zilong Chen, Zeyu Wang +10

Computer Vision Data Curation & Synthetic Data Multimodal Models