Mar 19, 2026arXiv:2603.18449

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Xuejing Yuan, Wangjun Zhang

AI Summary

Cross-Model Neuron Transfer (CNT) is introduced as a post-hoc method for transferring safety-oriented functionality between LLMs by selectively transferring a minimal subset of neurons from a donor model to a target model. CNT enables modular function-level adaptation, supporting both function addition and deletion for safety. Experiments across seven LLMs and three safety applications (disalignment, alignment enhancement, bias removal) show CNT achieves targeted functionality transfer with minimal performance degradation (<1%), outperforming baselines.

Key Contribution

Stealing just the right neurons from another LLM lets you patch safety holes or remove biases in your own, with almost no performance hit.

Abstract

The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures. In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs across three representative applications: safety disalignment, alignment enhancement, and bias removal. Experimental results show that CNT achieves targeted safety-oriented functionality transfer with minimal performance degradation (less than 1% for most models), consistently outperforming five baselines, demonstrating its generality and practical effectiveness.

Constitutional AI & AI Ethics Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References78

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Related Papers