Apr 27, 2026arXiv:2604.24693

Contextual Linear Activation Steering of Language Models

Brandon Hsu, Brandon Hsu, Daniel Beaglehole, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Adityanarayanan Radhakrishnan, Mikhail Belkin, Mikhail Belkin

AI Summary

This paper introduces Contextual Linear Activation Steering (CLAS), which dynamically adjusts linear activation steering strength based on input context to improve consistency. CLAS learns context-dependent steering strengths, outperforming standard linear activation steering across eleven benchmarks and four model families. The method achieves performance comparable to ReFT and LoRA in low-data regimes, offering a more accurate and scalable steering approach.

Key Contribution

Forget fixed steering strengths - CLAS dynamically adapts steering based on context, unlocking more consistent and powerful control over LLM behavior.

Abstract

Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.

Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Contextual Linear Activation Steering of Language Models

Related Papers