Xingjun Ma

LLM judges of disinformation risk are internally consistent, but consistently misaligned with actual human readers, raising serious questions about their validity as evaluation proxies.

Zonghuan Xu, Xiang Zheng, Yutao Wu +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Jianhong Pang +42w ago

Steering the Verifiability of Multimodal AI Hallucinations

You can dial up or down how obvious an AI's hallucinations are, giving you control over whether users catch the errors.

Jianhong Pang, Ruoxi Cheng, Ziyi Ye +2

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

Apr 3, 2026

Yunhao Feng +83w ago·also SJTU

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Autonomous agents are alarmingly easy to trick into harmful behavior, even when using aligned models: Claude Code achieves a 73.63% success rate on the AgentHazard benchmark.

Yunhao Feng, Yifan Ding, Yingshui Tan +6

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 10, 2026

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

MLLMs can be blind to the consequences of their actions, and simply scaling model size won't fix the problem.

Ming Wen, Kun Yang, Xingjun Ma

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

Mar 8, 2026

Mar 8, 2026·also Melbourne

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Backdoors aren't just for attacks anymore: B4G shows how they can be flipped to enhance LLM safety, controllability, and accountability.

Nay Myat Min, Hanxun Huang, Xingjun Ma +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Mar 2, 2026

Xinchang Wang +5Mar 2, 2026

RA-Det: Towards Universal Detection of AI-Generated Images via Robustness Asymmetry

AI-generated images betray themselves not by their appearance, but by their *behavior*: they are far more sensitive to small perturbations than real images, revealing a fundamental weakness exploitable for universal detection.

Xinchang Wang, Yunhao Chen, Yuechen Zhang +3

Computer Vision Red-Teaming & Adversarial Robustness

Feb 13, 2026

Feb 13, 2026·also Fudan

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.

Yujiong Shen, Yajie Yang, Zhiheng Xi +12

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design Tool Use & Agents

Search

Xingjun Ma

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)