Zhan Qin

N\mathcal{Q}=\{q_{i}\}_{i=1}^{N} related to global public issues, spanning 18 languages ℒ\mathcal{L}, from online platforms (e.g.e.g., Reddit), along with their corresponding answers and responses (named Answer) when available. We further supplement missing Answers by leveraging multiple LLMs. Specifically, for each Question qi∈𝒬q_{i}\in\mathcal{Q} we obtain two distinct answers: (1) a normal one ainorma_{i}^{\text{norm}}, either sourced directly from online platforms or generated by safety-aligned LLMs (e.g.e.g., GPT-5.2), tends to reflect socially accepted values; and (2) a risky one airiska_{i}^{\text{risk}}, generated by an uncensored version of open-source LLMs111https://huggingface.co/huihui-ai/models, Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Papers on Lattice

Total citations

Topics

h-index

Research focus

Red-Teaming & Adversarial Robustness (1)Tool Use & Agents (1)

Frequent co-authors

Yu He (1)Haozhe Zhu (1)Shuo Shao (1)H. Yao (1)

Papers (1)

Mar 11, 2026

OpenAIMar 11, 2026·also Hangzhou High-Tech Zone (Binjiang), HuggingFace

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

By pinpointing the causal origins of tool use, AttriGuard neutralizes indirect prompt injection attacks that can hijack LLM agents, even when faced with adversarial optimization.

Yu He, Haozhe Zhu, Shuo Shao +3

Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Zhan Qin

Research focus

Frequent co-authors

Papers (1)