Yuyi Huang

Papers on Lattice

Total citations

Topics

h-index

Research focus

Constitutional AI & AI Ethics (1)Natural Language Processing (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Runzhe Zhan (1)Derek F. Wong (1)Lidia S. Chao (1)Ailin Tao (1)

Papers (1)

Feb 23, 2025

Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models

LLM safety mechanisms are more vulnerable than we thought: psychological priming attacks achieve near-perfect success rates in eliciting harmful content across a wide range of models, including GPT-4o and Llama-3.2.

Yuyi Huang, Runzhe Zhan, Derek F. Wong +2

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Search

Yuyi Huang

Research focus

Frequent co-authors

Papers (1)