Search papers, labs, and topics across Lattice.
Blavatnik School of Computer Science and AI, Tel Aviv University
1
3
1
1
Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.