Jan Kulveit

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Interpretability & Mechanistic Interp (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Theia Pearson-Vogel (1)Martin Vanek (1)Raymond Douglas (1)

Papers (1)

Feb 23, 2026

Theia Pearson-Vogel +33w ago

Latent Introspection: Models Can Detect Prior Concept Injections

LLMs may already possess surprisingly strong self-awareness of concept manipulation, detectable via mechanistic interpretability techniques, even when they deny it in their outputs.

Theia Pearson-Vogel, Martin Vanek, Raymond Douglas +1

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Search

Jan Kulveit

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)