Search papers, labs, and topics across Lattice.
Tel Aviv University, Google Research
5
3
7
32
LLMs' persistent hallucinations aren't just about lacking knowledge, but about lacking the self-awareness to know what they *don't* know, suggesting uncertainty expression is key to building trustworthy AI.
Unlocking interpretability just got easier: ROTATE disentangles MLP neurons without data, revealing sparse, concept-aligned vocabulary channels directly from model weights.
Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.
Reasoning unlocks factual knowledge in LLMs, but beware: hallucinated reasoning steps can poison the well.
Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.