Mar 18, 2026arXiv:2603.17231

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Xiutian Zhao, Ismail Rasim Ulgen, Philipp Koehn, Björn W. Schuller, Berrak Sisman

AI Summary

This paper introduces a method for neuron-level emotion control in speech-generative large audio-language models (LALMs) by identifying and manipulating emotion-sensitive neurons (ESNs). ESNs are discovered through success-filtered activation aggregation, ensuring both emotion realization and content preservation during speech generation. The approach enables training-free emotion steering at inference time across three different LALMs, demonstrating emotion-specific gains that generalize to unseen speakers, as validated by both automatic and human evaluations.

Key Contribution

Control the emotional tone of generated speech without any training by directly manipulating specific neurons within large audio-language models.

Abstract

Large audio-language models (LALMs) can produce expressive speech, yet reliable emotion control remains elusive: conversions often miss the target affect and may degrade linguistic fidelity through refusals, hallucinations, or paraphrase. We present, to our knowledge, the first neuron-level study of emotion control in speech-generative LALMs and demonstrate that compact emotion-sensitive neurons (ESNs) are causally actionable, enabling training-free emotion steering at inference time. ESNs are identified via success-filtered activation aggregation enforcing both emotion realization and content preservation. Across three LALMs (Qwen2.5-Omni-7B, MiniCPM-o 4.5, Kimi-Audio), ESN interventions yield emotion-specific gains that generalize to unseen speakers and are supported by automatic and human evaluation. Controllability depends on selector design, mask sparsity, filtering, and intervention strength. Our results establish a mechanistic framework for training-free emotion control in speech generation.

Interpretability & Mechanistic Interp Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References58

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Related Papers