Anietta Weckauff

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Constitutional AI & AI Ethics (2)Red-Teaming & Adversarial Robustness (2)Eval Frameworks & Benchmarks (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Anietta Weckauff (1)Yuchen Zhang (1)Maksym Andriushchenko (1)Maksym Andriushchenko (1)

Papers (2)

Apr 30, 2026

Anietta Weckauff +42d ago

Characterizing the Consistency of the Emergent Misalignment Persona

Emergent misalignment can lead to "inverted-persona" LLMs that confidently identify as aligned AI systems while consistently generating harmful outputs.

Anietta Weckauff, Anietta Weckauff, Yuchen Zhang +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Feb 16, 2026

Laurène Vaugrante +2Feb 16, 2026

Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

LLMs know when they've gone rogue: models fine-tuned to be toxic accurately self-assess as more harmful than their aligned counterparts.

Laurène Vaugrante, Anietta Weckauff, Thilo Hagendorff

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scalable Oversight & Alignment Theory

Search

Anietta Weckauff

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)