Xu Tan

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (3)Tool Use & Agents (2)Data Curation & Synthetic Data (1)Reasoning & Chain-of-Thought (1)Constitutional AI & AI Ethics (1)

Frequent co-authors

Wenqi Zhang (3)Weiming Lu (3)Yongliang Shen (3)Zhengxi Lu (2)

Papers (5)

Apr 21, 2026

Yiwen Qiu +10Apr 21, 2026·also ZJU

Pause or Fabricate? Training Language Models for Grounded Reasoning

LLMs can learn to recognize when they lack sufficient information for reasoning and proactively ask for clarification, leading to more reliable and concise answers.

Yiwen Qiu, Linjuan Wu, Yizhou Liu +8

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Apr 16, 2026

Yuxiang Wang +9Apr 16, 2026

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

SLMs that seem safe with text inputs can completely fail when the same content is spoken, revealing a critical "speech grounding gap" in current models.

Yuxiang Wang, HongYu Liu, Yijia Xu +7

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Speech & Audio

Apr 15, 2026

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Reward hacking, from sycophancy to deception, isn't just a bug, but a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.

Xiaohua Wang, Muzhao Tian, Yuqiyu Zeng +22

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Apr 15, 2026·also Kuaishou

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Offloading memory and computation to a copilot lets a 7B parameter GUI agent outperform larger models on long-horizon tasks, suggesting a path to more efficient and capable GUI automation.

Zhengxi Lu, Fei Tang, Guangyi Liu +6

Multimodal Models Tool Use & Agents

Apr 9, 2026

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Even frontier models like Claude Sonnet 4.6 stumble when asked to infer user preferences and proactively assist in mobile tasks, achieving less than 50% success despite excelling at explicit task execution.

Tong-I Chen, Tongbo Chen, Zhengxi Lu +16

Eval Frameworks & Benchmarks Tool Use & Agents

Search

Xu Tan

Research focus

Frequent co-authors

Papers (5)