Alexey Dontsov

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Interpretability & Mechanistic Interp (1)

Frequent co-authors

Anton Korznikov (1)Andrey V. Galichin (1)Oleg Y. Rogov (1)Ivan V. Oseledets (1)

Papers (1)

Feb 15, 2026

Feb 15, 2026·also AIRI, ISP RAS Research Center for Trusted AI

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Sparse autoencoders, hyped as a key interpretability tool, may not be learning much more than random feature sets, casting doubt on their ability to decompose model internals.

Anton Korznikov, Andrey V. Galichin, Alexey Dontsov +3

Interpretability & Mechanistic Interp

Search

Alexey Dontsov

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)