Mar 19, 2026arXiv:2603.18793

Functional Subspace Watermarking for Large Language Models

Zikang Ding, Junhao Li, Suling Wu, Junchi Yao, Hongbo Liu, Lijie Hu

AI Summary

This paper introduces Functional Subspace Watermarking (FSW), a novel watermarking framework for LLMs that embeds ownership signals within a stable, low-dimensional functional subspace derived via a generalized eigenvalue problem. By adaptively truncating the spectrum and enforcing vector consistency, FSW achieves a balance between watermark robustness and model utility. Experiments demonstrate FSW's superior detection accuracy and statistical verifiability against various model modifications compared to existing watermarking methods.

Key Contribution

LLM watermarks can now survive fine-tuning, quantization, and distillation thanks to a new method that embeds them in a stable functional subspace.

Abstract

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.

Inference & Quantization Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Functional Subspace Watermarking for Large Language Models

Related Papers