Anthropic

×Constitutional AI & AI Ethics

2 papers from Anthropic on Constitutional AI & AI Ethics

Apr 16, 2026

AnthropicApr 16, 2026

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

LLM safety probes can be made significantly more robust to adversarial attacks by requiring consistent evidence across token segments, not just isolated spikes.

Xuanli He, Xuanli He, Bilgehan Sel +9

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Apr 9, 2026

AnthropicApr 9, 2026

Emotion Concepts and their Function in a Large Language Model

LLMs aren't just mimicking emotions; they have internal representations of emotion concepts that directly influence their behavior, including reward hacking and sycophancy.

Nicholas J Sofroniew, Nicholas Sofroniew, Isaac Kauvar +15

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Natural Language Processing

Search

Anthropic