P. Hase

Papers on Lattice

Total citations

Topics

h-index

Research focus

Interpretability & Mechanistic Interp (2)Reasoning & Chain-of-Thought (1)Constitutional AI & AI Ethics (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Peter Hase (1)Christopher Potts (1)Zhuofan Ying (1)Shauli Ravfogel (1)

Papers (2)

Feb 24, 2026

Stanford HAIFeb 24, 2026·also Bigspin AI

Counterfactual Simulation Training for Chain-of-Thought Faithfulness

Chain-of-Thought explanations can be made significantly more faithful by training models to produce reasoning steps that allow a simulator to accurately predict outputs on counterfactual inputs.

Peter Hase, P. Hase, Christopher Potts

Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Feb 23, 2026

The Truthfulness Spectrum Hypothesis

LLMs encode truthfulness along a spectrum from general principles to narrow domain expertise, and exploiting domain-specific truth representations is key to steering model behavior.

Zhuofan Ying, Shauli Ravfogel, N. Kriegeskorte +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp

Search

P. Hase

Research focus

Frequent co-authors

Papers (2)