Tao Gui

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (4)Eval Frameworks & Benchmarks (3)RLHF & Preference Learning (2)Multimodal Models (2)

Frequent co-authors

Xuanjing Huang (6)Shihan Dou (4)Zhiheng Xi (3)Shichun Liu (2)

Papers (7)

Apr 15, 2026

2w ago

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Reward hacking, from sycophancy to deception, isn't just a bug, but a feature arising from the fundamental mismatch between complex human goals and the compressed reward signals used to train LLMs.

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +22

Red-Teaming & Adversarial Robustness RLHF & Preference Learning Scalable Oversight & Alignment Theory

Jiahang Lin +162w ago·also Fudan

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.

Jiahang Lin, Kai Hu, Binghai Wang +14

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Mar 16, 2026

CCTU: A Benchmark for Tool Use under Complex Constraints

Even the best LLMs fail to follow complex constraints in tool use more than 50% of the time, revealing a critical weakness in real-world agent deployment.

Junjie Ye, Guoqiang Zhang, Wenjie Fu +2

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Mar 12, 2026

Can RL Improve Generalization of LLM Agents? An Empirical Study

RFT's impressive in-domain performance masks surprisingly weak generalization to new environments, highlighting a critical challenge for deploying LLM agents in the real world.

Zhiheng Xi, Jiazheng Zhang, Yutao Fan +8

Eval Frameworks & Benchmarks RLHF & Preference Learning Tool Use & Agents

Feb 13, 2026

Feb 13, 2026·also Fudan

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.

Yujiong Shen, Yajie Yang, Zhiheng Xi +12

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design Tool Use & Agents

Jan 16, 2026

MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models

Retrofit your VLMs with Multi-Head Latent Attention (MLA) for faster inference and smaller memory footprint, without costly pretraining, using this parameter-efficient conversion framework.

Xiaoran Fan, Zhichao Sun, Tao Ji +2

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Multimodal Models

Jan 7, 2026

Google ResearchJan 7, 2026·also Fudan, HuggingFace

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.

Changhao Jiang, Jiahao Chen, Zhenghao Xiang +14

Data Curation & Synthetic Data Open-Source Models & Weights Speech & Audio

Search

Tao Gui

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)