Or Shafran

Blavatnik School of Computer Science and AI, Tel Aviv University

Papers on Lattice

Total citations

Topics

h-index

Research focus

Interpretability & Mechanistic Interp (1)

Frequent co-authors

Atticus Geiger (1)Mor Geva (1)

Papers (1)

Jun 12, 2025

Jun 12, 2025·also Google Research

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.

Or Shafran, Atticus Geiger, Mor Geva

Interpretability & Mechanistic Interp

Search

Or Shafran

Research focus

Frequent co-authors

Papers (1)