Thomas Jiralerspong

Mila -Quebec AI Institute, Université de Montréal, Astra Fellowship

Mila

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (1)Scalable Oversight & Alignment Theory (1)

Frequent co-authors

Joachim Schaeffer (1)Alexander Panfilov (1)Guillaume Lajoie (1)Jonas Geiping (1)

Papers (1)

Jun 9, 2026

Mila1w ago·also DeepMind, Astra Fellowship, ELLIS, Max Planck +1

CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs

Control interventions are often detected by LLMs, with awareness levels varying significantly across models and tasks, revealing vulnerabilities in AI safety protocols.

Joachim Schaeffer, Thomas Jiralerspong, Alexander Panfilov +4

Eval Frameworks & Benchmarks Scalable Oversight & Alignment Theory

Search

Thomas Jiralerspong

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)