Search papers, labs, and topics across Lattice.
The authors identify and validate entity-selective neurons in early layers of language models using templated prompts and causal interventions on PopQA. They demonstrate that ablating these neurons causes entity-specific amnesia, while activating them in a controlled manner improves answer retrieval, suggesting these neurons act as compact entity retrieval mechanisms. The robustness of these neurons to variations in entity names further supports a canonicalization interpretation of their function.
Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.
Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once the context is initialized, consistent with compact entity retrieval rather than purely gradual enrichment across depth. Robustness to aliases, acronyms, misspellings, and multilingual forms supports a canonicalization interpretation. The effect is strong but not universal: not every entity admits a reliable single-neuron handle, and coverage is higher for popular entities. Overall, these results identify sparse, causally actionable access points for analyzing and modulating entity-conditioned factual behavior.