He He

Anthropic

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Reasoning & Chain-of-Thought (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Chen Yueh-Han (1)Robert McCarthy (1)Bruce W. Lee (1)Ian Kivlichan (1)

Papers (1)

Mar 5, 2026

AnthropicMar 5, 2026·also Google Research

Reasoning Models Struggle to Control their Chains of Thought

Reasoning models are surprisingly bad at controlling their own thoughts: Claude Sonnet 4.5 can control its chain-of-thought only 2.7% of the time, raising questions about the reliability of CoT monitoring.

Chen Yueh-Han, Robert McCarthy, Bruce W. Lee +5

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

Search

He He

Research focus

Frequent co-authors

Papers (1)