DFKISaarland Informatics CampusApr 15, 2026arXiv:2604.14090

From Weights to Activations: Is Steering the Next Frontier of Adaptation?

S. Ostermann, Simon Ostermann, Daniil Gurgurov, Daniil Gurgurov, Tanja Baeumel, Tanja Baeumel, Michael A. Hedderich, Michael A. Hedderich, S. Lapuschkin, Sebastian Lapuschkin, Wojciech Samek, Wojciech Samek, Vera Schmitt, Vera Schmitt

AI Summary

This paper reframes activation steering as a distinct paradigm for model adaptation, contrasting it with parameter updates and input-based methods. They propose functional criteria to analyze and compare steering with traditional adaptation techniques like fine-tuning. Their analysis highlights steering's ability to induce local and reversible behavioral changes through targeted interventions in activation space, without modifying model parameters.

Key Contribution

Steering isn't just a trick; it's a fundamentally different way to adapt language models, offering localized, reversible control that traditional fine-tuning can't match.

Abstract

Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adaptation, and prompting. In parallel, a growing body of work modifies internal activations at inference time to influence model behavior, an approach known as steering. Despite increasing use, steering is rarely analyzed within the same conceptual framework as established adaptation methods. In this work, we argue that steering should be regarded as a form of model adaptation. We introduce a set of functional criteria for adaptation methods and use them to compare steering approaches with classical alternatives. This analysis positions steering as a distinct adaptation paradigm based on targeted interventions in activation space, enabling local and reversible behavioral change without parameter updates. The resulting framing clarifies how steering relates to existing methods, motivating a unified taxonomy for model adaptation.

Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References107

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Weights to Activations: Is Steering the Next Frontier of Adaptation?

Related Papers