Google ResearchIndepdent ResearcherMentaleapTAUApr 1, 2026arXiv:2604.01404

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Itay Yona, Daniel Barzilay, Dan Barzilay, M. Karasik, Michael Karasik, Mor Geva

AI Summary

The authors identify and validate entity-selective neurons in early layers of language models using templated prompts and causal interventions on PopQA. They demonstrate that ablating these neurons causes entity-specific amnesia, while activating them in a controlled manner improves answer retrieval, suggesting these neurons act as compact entity retrieval mechanisms. The robustness of these neurons to variations in entity names further supports a canonicalization interpretation of their function.

Key Contribution

Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.

Abstract

Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once the context is initialized, consistent with compact entity retrieval rather than purely gradual enrichment across depth. Robustness to aliases, acronyms, misspellings, and multilingual forms supports a canonicalization interpretation. The effect is strong but not universal: not every entity admits a reliable single-neuron handle, and coverage is higher for popular entities. Overall, these results identify sparse, causally actionable access points for analyzing and modulating entity-conditioned factual behavior.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Related Papers