Stefanos Koffas

Delft University of Technology, SecureML

Papers on Lattice

Total citations

Topics

Research focus

Red-Teaming & Adversarial Robustness (2)Distributed Systems & Hardware (1)Natural Language Processing (1)Interpretability & Mechanistic Interp (1)

Frequent co-authors

Stjepan Picek (2)Oğuzhan Ersoy (1)Nikolay Blagoev (1)Marina Krček (1)

Papers (2)

Mar 31, 2026

GensynMar 31, 2026·also Radboud, SecureML, TU Delft, University of Neuchatel

Backdoor Attacks on Decentralised Post-Training

Even a single compromised pipeline stage can inject backdoors that drastically misalign LLMs, bypassing standard safety alignment.

Oğuzhan Ersoy, Nikolay Blagoev, Stefanos Koffas +2

Distributed Systems & Hardware Natural Language Processing Red-Teaming & Adversarial Robustness

Mar 10, 2026

University of BergenMar 10, 2026·also Radboud, SecureML, TU Delft

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Backdoor defenses focused on removing training triggers are fundamentally flawed, as alternative, perceptually distinct triggers can reliably activate the same backdoor via a latent feature-space direction.

Gorka Abad, Ermes Franch, Stefanos Koffas +1

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Search

Stefanos Koffas

Research focus

Frequent co-authors

Papers (2)