Tsinghua AICisco ResearchCUHKNJITUNCZirui He1 Guanchu Wang2 Ali Payani3May 25, 2026arXiv:2605.25903

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani, Yingcong Li, Mengnan Du

AI Summary

The paper introduces Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from diverse "donor" models in natural language. UAV employs lightweight adapters to map donor activations into soft tokens within the decoder's embedding space, enabling cross-model activation explanation. Experiments across various tasks demonstrate that UAV achieves performance comparable to self-explanation baselines while facilitating verbalization across different model families and scales.

Key Contribution

Unlock the Rosetta Stone for neural networks: UAV lets one model explain the inner workings of *any* other, regardless of architecture or size.

Abstract

Activation verbalization explains hidden representations in natural language, but existing methods are mostly limited to self-explanation, where each model explains only its own activations. We introduce Universal Activation Verbalizer (UAV), a framework that uses a shared decoder to explain activations from heterogeneous donor models. UAV learns a lightweight adapter that converts donor activations into soft tokens in decoder's embedding space, and further supports adapter-only transfer by reusing a frozen decoder-side LoRA while training only a new adapter for another donor. Across classification, fact retrieval, and gist summarization, UAV remains competitive with strong self-explanation baselines while enabling cross-model verbalization across model families and scales. Ablations show that decoder-side tuning mainly improves task behavior, whereas the adapter provides the activation-grounded factual and semantic information needed for faithful explanations.

Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

Related Papers