Search papers, labs, and topics across Lattice.
Delft University of Technology, SecureML
2
0
4
Even a single compromised pipeline stage can inject backdoors that drastically misalign LLMs, bypassing standard safety alignment.
Backdoor defenses focused on removing training triggers are fundamentally flawed, as alternative, perceptually distinct triggers can reliably activate the same backdoor via a latent feature-space direction.