Yixu Wang

Guard models trained with BraveGuard can detect safety threats in computer-use agents with over 82% accuracy, a significant leap from conventional methods.

Yunhao Feng, Xiaohu Du, Xinhao Deng +14

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Jan 4, 2026

Jan 4, 2026·also ZJU

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Even state-of-the-art multimodal LLMs like GPT-5.2 and Claude 4.5 can be jailbroken nearly half the time using OpenRT's diverse suite of attacks, revealing a critical lack of generalization across attack paradigms.

Xin Wang, Yunhao Chen, Juncheng Li +8

Search

Yixu Wang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)