Google DeepMind

×Reasoning & Chain-of-Thought

7 papers from Google DeepMind on Reasoning & Chain-of-Thought

Mar 31, 2026

Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

Training LLMs to optimize for conflicting objectives between the final output and the reasoning process can significantly degrade the monitorability of Chain-of-Thought, making oversight more difficult.

Max Kaufmann, David Lindner, Roland S. Zimmermann +1

Reasoning & Chain-of-Thought RLHF & Preference Learning Scalable Oversight & Alignment Theory

Mar 10, 2026

Google Research3w ago·also CMU ML, DeepMind

Think Before You Lie: How Reasoning Improves Honesty

LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.

Ann Yuan, Asma Ghandeharioun, Carter Blum +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

DeepMind3w ago

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Mixture-of-Experts models might be hiding more of their reasoning than we thought, thanks to a newly quantified "opaque serial depth" metric.

Jonah Brown-Cohen, David Lindner, Rohin Shah

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Mar 5, 2026

3w ago·also DeepMind, Defence Science and Technology Group

Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

Achieve significantly better code generation and mathematical problem solving from diffusion language models with a simple, training-free sampling tweak that encourages diversity.

Sean Lamont, Christian Walder, Christian J. Walder +3

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

Feb 24, 2026

DeepMindFeb 24, 2026·also CMU ML, Google Research

Aletheia tackles FirstProof autonomously

Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.

Tony Feng, Tony Feng, Junehyuk Jung +27

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Feb 18, 2026

Microsoft ResearchFeb 18, 2026

Training Large Reasoning Models Efficiently via Progressive Thought Encoding

Forget full-cache rollouts: this parameter-efficient fine-tuning method lets large reasoning models maintain accuracy while slashing memory usage during RL training.

Zeliang Zhang, XiaoDong Liu, Hao Cheng +3

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Oct 13, 2025

Microsoft ResearchOct 13, 2025·also DeepMind, Google Research

Bag of Tricks for Subverting Reasoning-based Safety Guardrails

Reasoning-based safety guardrails, once thought to be a strong defense against jailbreaks, crumble with just a few strategically placed tokens.

Shuo Chen, Zhen Han, Haokun Chen +6

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

Search

Google DeepMind