Sarah Wiegreffe

University of Maryland

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Interpretability & Mechanistic Interp (2)Reasoning & Chain-of-Thought (1)Red-Teaming & Adversarial Robustness (1)RLHF & Preference Learning (1)

Frequent co-authors

Hillary N. Owusu (1)Naomi H. Feldman (1)Stephen Cheng (1)Stephen Cheng (1)

Papers (2)

Jun 11, 2026

2w ago

Localizing Anchoring Pathways in Language Models

Edge-level methods uncover how irrelevant numerical anchors influence language model judgments, revealing shared pathways that shift with model tuning.

Hillary N. Owusu, Sarah Wiegreffe, Naomi H. Feldman

Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Apr 9, 2026

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

Steering vectors work primarily by nudging the output value (OV) circuit in attention, not by re-weighting attention scores, and can be drastically sparsified without losing effectiveness.

Stephen Cheng, Stephen Cheng, Sarah Wiegreffe +1

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Search

Sarah Wiegreffe

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)