Mor Geva

Tel Aviv University, Google Research

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (3)Interpretability & Mechanistic Interp (3)Constitutional AI & AI Ethics (1)RLHF & Preference Learning (1)Architecture Design (Transformers, SSMs, MoE) (1)

Frequent co-authors

G. Yona (1)Yossi Matias (1)Asaf Avrahamy (1)Yoav Gur-Arieh (1)

Papers (5)

May 2, 2026

Google ResearchMay 2, 2026·also TAU

Hallucinations Undermine Trust; Metacognition is a Way Forward

LLMs' persistent hallucinations aren't just about lacking knowledge, but about lacking the self-awareness to know what they *don't* know, suggesting uncertainty expression is key to building trustworthy AI.

G. Yona, Mor Geva, Yossi Matias

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Apr 7, 2026

Asaf Avrahamy +2Apr 7, 2026·also Google Research, TAU

Disentangling MLP Neuron Weights in Vocabulary Space

Unlocking interpretability just got easier: ROTATE disentangles MLP neurons without data, revealing sparse, concept-aligned vocabulary channels directly from model weights.

Asaf Avrahamy, Yoav Gur-Arieh, Mor Geva

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Apr 1, 2026

MentaleapApr 1, 2026·also Google Research, Indepdent Researcher, TAU

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.

Itay Yona, Dan Barzilay, Daniel Barzilay +3

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Mar 10, 2026

Google ResearchMar 10, 2026·also AI2, TAU, Technion

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Reasoning unlocks factual knowledge in LLMs, but beware: hallucinated reasoning steps can poison the well.

Zorik Gekhman, Roee Aharoni, E. Ofek +4

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Jun 12, 2025

Jun 12, 2025·also Google Research

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.

Or Shafran, Atticus Geiger, Mor Geva

Interpretability & Mechanistic Interp

Search

Mor Geva

Research focus

Frequent co-authors

Papers (5)