Alexander Panfilov

MPI for Intelligent Systems, ELLIS Institute Tübingen, Tübingen AI Center

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Joachim Schaeffer (1)Thomas Jiralerspong (1)Guillaume Lajoie (1)Jonas Geiping (1)

Papers (1)

Jun 9, 2026

Mila1w ago·also DeepMind, Astra Fellowship, ELLIS, Max Planck +1

CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs

Control interventions are often detected by LLMs, with awareness levels varying significantly across models and tasks, revealing vulnerabilities in AI safety protocols.

Joachim Schaeffer, Thomas Jiralerspong, Alexander Panfilov +4

Eval Frameworks & Benchmarks Scalable Oversight & Alignment Theory

Search

Alexander Panfilov

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)