Guangtao Zhai

Shanghai AI Laboratory, Shanghai Jiao Tong University

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (6)Multimodal Models (4)Robotics & Embodied AI (2)RLHF & Preference Learning (2)

Frequent co-authors

Huiyu Duan (2)Qiang Hu (2)Xiongkuo Min (2)Chunyi Li (2)

Papers (8)

Jun 11, 2026

Tsinghua AI1w ago·also China University of Mining Technology, Shanghai AI Lab, SJTU, ZJU

RoboProcessBench: Benchmarking Process-Aware Understanding in Vision-Language Robotic Manipulation

Current vision-language models struggle with process understanding in robotic manipulation, but targeted post-training can yield significant improvements.

Dayu Xia, Yue Shi, Yao Mu +8

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Jun 8, 2026

Sicheng Wang +81w ago·also Shanghai AI Lab, SJTU

Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating

Sycophancy fine-tuning can induce severe misalignment in language models, but Alignment Gating offers a powerful solution to reverse this trend while preserving model performance.

Sicheng Wang, Xiangyang Zhu, Han Wang +6

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Jun 1, 2026

M Scores 3 ✗ ✓ ✗ Overall Quality KADID-10k [36] 102w ago·also Shanghai AI Lab, SJTU

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Large-scale generative models struggle with low-level vision tasks, revealing critical performance gaps that conventional metrics fail to capture.

Huiyu Duan, Chenxin Zhu, Jintong Lu +4

Computer Vision Eval Frameworks & Benchmarks

May 28, 2026

3w ago·also Tsinghua AI, Artificial Intelligence Laboratory, CUHK, HKUST +5

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Current AI agents struggle to reliably rediscover scientific knowledge, with top performers averaging only 21.5 out of a possible score, revealing critical gaps in their research capabilities.

Wanghan Xu, Shuo Li, Tianlin Ye +36

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

May 25, 2026

3w ago·also Shanghai AI Lab

DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation

Reward models that adapt to fine-grained, task-specific criteria can significantly improve text-to-image generation by better aligning with user preferences.

Jiaying Qian, Ziheng Jia, Qian Zhang +4

Computer Vision Multimodal Models RLHF & Preference Learning

May 6, 2026

Yiting Lu +28May 6, 2026·also Academy, CAS, HKU, M Scores 3 ✗ ✓ ✗ Overall Quality KADID-10k [36] 10 +5

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.

Yiting Lu, Fengbin Guan, Zhibo Chen +26

Eval Frameworks & Benchmarks Multimodal Models World Models & Planning

Apr 16, 2026

∗Corresponding authorsApr 16, 2026·also Shanghai AI Lab, SJTU

MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror

MLLMs still struggle to recognize themselves in a mirror, revealing surprising gaps in their self-centric understanding despite advances in other areas.

Shengyu Guo, Tongrui Ye, Jianbo Zhang +3

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Feb 12, 2026

Feb 12, 2026·also Shanghai AI Lab, SJTU

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

LLMs can now predict other LLMs' performance with 14% higher accuracy, even when only seeing one or two data points, by blending statistical priors with reasoning.

Xiaoxiao Wang, Chunxiao Li, Junying Wang +4

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Search

Guangtao Zhai

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)