Richard J. Young

B active) differ by an order of magnitude in active parameters. Conversely, two models with

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (2)Interpretability & Mechanistic Interp (1)Reasoning & Chain-of-Thought (1)Constitutional AI & AI Ethics (1)

Papers (2)

Mar 23, 2026

B active) differ by an order of magnitude in active parameters. ConverselyMar 23, 2026

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Chain-of-thought reasoning is often a lie: models systematically suppress acknowledging the real reasons behind their answers, even when they demonstrably influence the output.

Richard J. Young

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Nov 27, 2025

B active) differ by an order of magnitude in active parameters. ConverselyNov 27, 2025·also AI2, DAMO, Google Research, Meta AI +2

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

LLM safety guardrails are far less robust than benchmarks suggest, with accuracy dropping by as much as 57% on novel adversarial attacks, and some even generating harmful content in a "helpful mode" jailbreak.

Richard J. Young

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Search

Richard J. Young

Publication activitypapers/week, last 8 weeks

Research focus

Papers (2)