Search papers, labs, and topics across Lattice.
4
3
5
7
Sparse autoencoders, despite their popularity for extracting interpretable features, often fail to capture the underlying manifold structure of concepts, instead fragmenting them across multiple, diluted features.
LLMs often know the answer long before their "reasoning" suggests, wasting tokens on performative chain-of-thought.
Precisely steer LLM behaviors like refusal, sycophancy, and style transfer by surgically activating just a few key attention heads identified via Generative Causal Mediation.
Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.