G. Gidel

Papers on Lattice

Total citations

Topics

h-index

Research focus

Constitutional AI & AI Ethics (1)Natural Language Processing (1)Red-Teaming & Adversarial Robustness (1)RLHF & Preference Learning (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Niklas Herbster (1)Martin Zborowski (1)A. Tosato (1)Alberto Tosato (1)

Papers (1)

Apr 9, 2026

Tara ResearchApr 9, 2026·also Mila, AI Institute, TU Munich

Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence

Continuously nudging LLM activations during generation can effectively correct misalignment without sacrificing coherence, offering a lightweight runtime defense against adversarial prompts and other triggers.

Niklas Herbster, Martin Zborowski, A. Tosato +4

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness+2

Search

G. Gidel

Research focus

Frequent co-authors

Papers (1)