Michał Brzozowski

Research focus

Natural Language Processing (3)Interpretability & Mechanistic Interp (2)Architecture Design (Transformers, SSMs, MoE) (1)Constitutional AI & AI Ethics (1)

Frequent co-authors

Neo Christopher Chung (4)Enrico Cassano (2)Zuzanna Dubanowska (2)Paolo Mandica (1)

Papers (4)

Jun 17, 2026

6d ago·also Samsung

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

Achieving 97.44% of optimal performance without any adapter training or internal access, ARIADNE revolutionizes how we dynamically select task-specific models at inference time.

Enrico Cassano, Michał Brzozowski, Zuzanna Dubanowska +2

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Jun 1, 2026

3w ago·also Warsaw

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

The supposed stability of archetypal SAEs evaporates when initialization is randomized, challenging the reliability of their concept extraction claims.

Michał Brzozowski, Neo Christopher Chung

Interpretability & Mechanistic Interp

3w ago·also Warsaw

The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

LLMs are not just generating random names; they create persistent, correlated character ensembles that are infiltrating academic publishing and could undermine scholarly integrity.

Michał Brzozowski, Neo Christopher Chung

Constitutional AI & AI Ethics Natural Language Processing

May 25, 2026

May 25, 2026·also Turin, Warsaw

Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing

Forget white-box access: this grey-box method recovers verbatim memorized content from finetuned LLMs by just comparing output logits, even revealing hidden data pipeline artifacts.

Michał Brzozowski, Zuzanna Dubanowska, Enrico Cassano +1

Interpretability & Mechanistic Interp Natural Language Processing Red-Teaming & Adversarial Robustness

Search

Michał Brzozowski

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)