Weitong Chen

Yangzhou University 2 Nanjing University 3 Shandong University 4 La Trobe University 5 University of Technology Sydney MX120240572@stu.yzu.edu.cn {wtchen,jialezhang}@yzu.edu.cn chengchengzhu@smail.nju.edu.cn chunpeng_ge@sdu.edu.cn d.wu@latrobe.edu.au guodong.long@uts.edu.au Corresponding author. Abstract Deep learning-based watermarking has made remarkable progress in recent years. To achieve robustness against various distortions, current methods commonly adopt a training strategy where a single random distortion (SRD) is chosen as the noise layer in each training batch. However, the SRD strategy treats distortions independently within each batch, neglecting the inherent relationships among different types of distortions and causing optimization conflicts across batches. As a result, the robustness and generalizability of the watermarking model are limited. To address this issue, we propose a novel training strategy that enhances robustness and generalization via meta-learning with feature consistency (Meta-FC). Specifically, we randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated “unknown” distortion for the meta-testing phase. Through meta-learning, the model is encouraged to identify and utilize neurons that exhibit stable activations across different types of distortions, mitigating the optimization conflicts caused by the random sampling of diverse distortions in each batch. To further promote the transformation of stable activations into distortion-invariant representations, we introduce a feature consistency loss that constrains the decoded features of the same image subjected to different distortions to remain consistent. Extensive experiments demonstrate that, compared to the SRD training strategy, Meta-FC improves the robustness and generalization of various watermarking models by an average of 1.59%, 4.71%, and 2.38% under high-intensity, combined, and unknown distortions. 1 Introduction Robust digital watermarking is a crucial technology for copyright protection [16, 3, 4] and has been extensively researched. With the rapid progress of deep learning, a growing number of watermarking approaches based on deep neural networks (DNNs) have been developed. These methods [37, 2, 9, 7, 30] typically adopt an end-to-end framework that consists of an encoder, a noise layer, and a decoder (END). The encoder embeds the watermark message into the cover image, while the decoder aims to extract and reconstruct the watermark message from the distorted images. The noise layer simulates various distortions during training to enhance the robustness of the model. Figure 1: The difference of the training process between SRD and Meta-FC. (a) The SRD pipeline. (b) The Meta-FC pipeline. In the meta-training phase, the model learns to be robust against known distortions. In the meta-testing phase, the model is evaluated under a simulated “unknown” distortion. Note that no truly unknown distortions are involved during the entire training process. To advance the capabilities of deep learning-based watermarking, current research has largely centered on refining network architectures [10, 24, 19] or developing sophisticated optimization objectives [13, 11, 8, 22, 32]. Although these approaches have achieved notable improvements, their impact on model robustness and generalization against diverse distortions remains constrained. We identify a critical, yet often overlooked, factor contributing to this limitation: the prevalent reliance on the single random distortion (SRD) training strategy [37, 23, 18, 15, 12]. Under the SRD paradigm, a single distortion is randomly selected from a predefined noise pool (e.g., JPEG, Crop) for each training batch, as illustrated in Figure 1 (a). However, this batch-by-batch, single-distortion approach inherently isolates the learning process for each distortion type. We argue that this isolation leads to two primary issues: 1) overfitting to distortion-specific features rather than learning true distortion-invariant representations, and 2) optimization instability, such as gradient conflicts across different distortions, which prevents the model from capturing these invariances. Consequently, the performance of watermarking models becomes demonstrably limited in three critical scenarios: high-intensity distortions, combined distortions, and unknown distortions (i.e., those not encountered during training). To address these limitations, we propose a novel training strategy for watermarking, called Meta-FC, which integrates meta-learning with a feature consistency loss. The core idea of Meta-FC is to improve the generalization of watermarking models by simulating training on known distortions and testing on “unknown” distortions within each training batch. This enables the model to discover distortion-invariant representations that are robust across diverse distortions. Specifically, as illustrated in Figure 1(b), for each batch, we randomly sample a subset of distortions from a predefined noise pool as the meta-training distortions, while a held-out distortion is designated as the meta-testing distortion. During the meta-training phase, the model is optimized across multiple sampled distortions to obtain temporary encoder and decoder parameters, along with the corresponding meta-training loss. The temporary parameters are then evaluated on a meta-testing distortion, which is excluded from meta-training and simulated as an “unknown” distortion, to compute the meta-testing loss. By jointly minimizing both losses, the model is encouraged to learn stable and adaptable parameters that remain effective under diverse and unknown distortion conditions. To further transform such stable and adaptable parameters into distortion-invariant representations, we introduce a feature consistency loss, which aligns the last-layer features of the decoder between the watermarked image and its distorted images. This alignment encourages the model to extract distortion-invariant representations, thereby improving the reliability of watermark recovery. The primary contributions of this paper are summarized as follows: • We reveal that the SRD training strategy inherently suffers from overfitting and gradient conflicts. To this end, we reform this training paradigm by proposing a novel meta-learning strategy (Meta-FC). • Meta-FC simulates training on known distortions and testing on “unknown” distortions within each batch, guiding the model to learn stable and adaptable parameters. Meanwhile, we introduce a feature consistency loss that aligns decoder features between watermarked and distorted images, promoting the learning of distortion-invariant representations. • Meta-FC is a plug-and-play training strategy that can be seamlessly integrated into any existing END-based watermarking model. • Extensive experiments demonstrate that Meta-FC significantly improves the robustness and generalization of different watermarking models under high-intensity, combined, and unknown distortions. 2 Related Work 2.1 Deep Learning for Robust Watermarking In recent years, with the advancement of deep learning, the DNN-based watermarking frameworks have been proposed [37, 2, 27, 23, 18, 25, 12, 31, 36, 22]. HiDDeN [37] introduced the first end-to-end watermarking model utilizing an END architecture. ReDMark [2] improved the imperceptibility and robustness of the watermarks by incorporating residual structures within the encoder. StegaStamp [27] achieved robustness against print-and-capture distortions by combining differentiable simulation noise layers and spatial transformation modules, providing a novel approach to handling non-differentiable distortions. TSDL [23] employed a two-stage separation architecture to achieve robustness against black-box distortions. MBRS [18] enhances robustness against JPEG compression by cyclically training on clean images, real JPEG images, and simulated JPEG images across mini-batches. FIN [12] proposed a flow-based watermarking framework that uses the invertibility of invertible neural networks (INNs) to enable weight sharing between the encoder and decoder, thereby ensuring tight coupling between the encoder and decoder. Subsequent works [8, 26] have further improved model robustness by modifying the internal design of the models, simulating noise layers, and altering the END architecture. These methods adopt the SRD training strategy, in which different distortions are alternated across training batches to achieve robustness against different distortions. However, SRD cannot effectively capture the commonalities between different distortions, making it difficult for the model to learn distortion-invariant representations. Figure 2: The whole training process of our proposed Meta-FC. First, the main encoder and decoder are used to process images under meta-training distortions, yielding the meta-training loss ℒmeta-train\mathcal{L}_{meta\text{-}train} (composed of ℒw,n\mathcal{L}_{w,n} and ℒmsgtra\mathcal{L}^{tra}_{msg}) and producing temporary encoder and decoder parameters. Next, these temporary parameters are evaluated on the meta-testing distortions to calculate the meta-testing loss ℒmeta-test\mathcal{L}_{meta\text{-}test} (composed of ℒmsgtes\mathcal{L}^{tes}_{msg}). Subsequently, the image loss ℒimg\mathcal{L}_{img} (composed of ℒimgtra\mathcal{L}^{tra}_{img} and ℒimgtes\mathcal{L}^{tes}_{img}) is computed based on the watermarked images generated by the main and the temporary encoders. Finally, the main model parameters are updated by minimizing the total loss ℒtotal\mathcal{L}_{total}, which consists of ℒmeta-train\mathcal{L}_{meta\text{-}train}, ℒmeta-test\mathcal{L}_{meta\text{-}test}, and ℒimg\mathcal{L}_{img}. 2.2 Meta-learning Meta-learning [17, 29, 35] aims to enhance the model’s ability to “learn how to learn”, enabling it to adapt more effectively to new tasks. MAML [14] optimizes the model’s initial parameters through two gradient updates, allowing it to quickly fine-tune and adapt to new tasks. MLDG [20] uses meta-learning to simulate domain shifts during training, guiding the model to learn domain-invariant features and significantly enhancing its generalization ability. MGAA [34] utilizes meta-learning to bridge the gradient gap between white-box and black-box attacks, improving the transferability of adversarial attacks. MLDGG [28] combines meta-learning with graph neural networks to enhance the model’s adaptability in cross-graph domain tasks. In fact, these methods essentially leverage meta-learning to mine invariant features across different tasks, improving the model’s generalization ability. Inspired by these works, we are naturally motivated to design a meta-learning training strategy for watermarking models that learn invariant features of images under various distortions, thereby improving their generalization ability. 3 Proposed Meta-FC Method 3.1 Key Insight and Intuition Existing deep learning-based watermarking methods typically achieve robustness against distortions by applying the SRD training strategy. However, SRD handles each distortion independently, without modeling the underlying commonalities among different distortions, which leads to limited generalization ability. Inspired by the success of meta-learning in domain generalization, we observe that meta-learning facilitates the extraction of shared representations across multiple domains through iterative meta-train and meta-test. This paradigm can be naturally extended to the modeling of diverse distortions in watermarking. Building on this insight, we formulate different combinations of distortions as tasks within a meta-learning framework and train the model to continuously adapt and optimize across diverse distortions. This approach gradually discovers a set of robust parameters that are resilient to various perturbations, thereby alleviating the optimization conflicts arising from competing distortion objectives. Furthermore, the meta-testing phase, by simulating “unknown” distortions, effectively evaluates the model’s generalization to such novel distortions, thereby improving its robustness against distortions not encountered during training. 3.2 Meta-FC Pipeline The proposed Meta-FC is a model-agnostic training strategy. As shown in Figure 2, we simulate meta-train with known distortions and meta-test with “unknown” distortions to narrow the gap of gradient directions between known and “unknown” distortions, improving the generalization of the watermarking model. Specifically, given a noise pool containing m+1m+1 distortions, we randomly sample mm distortions as meta-training distortions and use the remaining one as the meta-testing distortion in each batch. 3.2.1 Meta-train. The main encoder ℰ\mathcal{E} and decoder 𝒟\mathcal{D} are trained using mm randomly selected meta-training distortions, resulting in mm decoded messages ℳtrai\mathcal{M}^{i}_{tra} and their corresponding last layer decoder features. The decoding loss ℒmsgtra\mathcal{L}_{msg}^{tra} is calculated based on ℳtrai\mathcal{M}^{i}_{tra}, while the feature consistency loss ℒw,n\mathcal{L}_{w,n} is calculated using the extracted features, which is detailed in Section Feature Consistency Loss. The ℒmsgtra\mathcal{L}^{tra}_{msg} and ℒw,n\mathcal{L}_{w,n} are aggregated to form the meta-training loss ℒmeta-train\mathcal{L}_{meta\text{-}train}, which is used to update the model parameters and obtain a temporary encoder ℰ′\mathcal{E}^{\prime} and decoder 𝒟′\mathcal{D}^{\prime}. This adaptation process across multiple distortions helps mitigate the optimization conflicts introduced by differing distortion objectives. ℒmeta-train\mathcal{L}_{meta\text{-}train} is defined as follows: ℒmsgtra=∑i=1mMSE(ℳen,ℳtrai)=∑i=1mMSE(ℳen,𝒟(θd,ℐtrai)),ℒmeta-train=ℒmsgtra+λf⋅ℒw,n,\begin{gathered}\mathcal{L}^{tra}_{msg}=\sum_{i=1}^{m}\textit{MSE}(\mathcal{M}_{en},\mathcal{M}^{i}_{tra})=\sum_{i=1}^{m}\textit{MSE}(\mathcal{M}_{en},\mathcal{D}(\theta_{d},\mathcal{I}^{i}_{tra})),\\[3.0pt] \mathcal{L}_{meta\text{-}train}=\mathcal{L}^{tra}_{msg}+\lambda_{f}\cdot\mathcal{L}_{w,n},\end{gathered} (1) where ℳen\mathcal{M}_{en} represents the embedded watermark message and MSE represents the mean squared error. ℐtrai\mathcal{I}^{i}_{tra} denotes the watermarked image after the ii-th distortion in the meta-training distortions, and 𝒟(θd,⋅)\mathcal{D}(\theta_{d},\cdot) is the decoder with the parameter θd\theta_{d}. The hyperparameter λf\lambda_{f} balances ℒmsgtra\mathcal{L}^{tra}_{msg} and ℒw,n\mathcal{L}_{w,n}, and is set to 0.0010.001 by default. 3.2.2 Meta-test. After training on multiple meta-training distortions, the sampled meta-testing distortion is used to simulate an “unknown” distortion. The ℰ′\mathcal{E}^{\prime} and 𝒟′\mathcal{D}^{\prime} are employed to compute the meta-testing loss ℒmeta-test\mathcal{L}_{meta\text{-}test}, which evaluates the generalization of the model to previously “unknown” distortions. ℒmeta-test\mathcal{L}_{meta\text{-}test} is defined as: ℒmsgtes=MSE(ℳen,ℳtes)=MSE(ℳen,𝒟′(θd′,ℐtes)),ℒmeta-test=ℒmsgtes,\begin{gathered}\mathcal{L}^{tes}_{msg}=\textit{MSE}(\mathcal{M}_{en},\mathcal{M}_{tes})=\textit{MSE}(\mathcal{M}_{en},\mathcal{D}^{\prime}(\theta^{\prime}_{d},\mathcal{I}_{tes})),\\ \mathcal{L}_{meta\text{-}test}=\mathcal{L}^{tes}_{msg},\end{gathered} (2) where ℒmsgtes\mathcal{L}^{tes}_{msg} is the message loss during the meta-testing phase and ℳtes\mathcal{M}_{tes} is the message decoded from the distorted image ℐtes\mathcal{I}_{tes}. Moreover, the watermarked image should maintain the visual quality of ℐco\mathcal{I}_{co}. To achieve this goal, we also introduce an image loss ℒimg\mathcal{L}_{img}, computed using both main and temporary encoders, as follows: ℒimgtra=MSE(ℐco,ℐwtra)=MSE(ℐco,ℰ(θe,ℐco,ℳen)),ℒimgtes=MSE(ℐco,ℐwtes)=MSE(ℐco,ℰ′(θe′,ℐco,ℳen)),ℒimg=ℒimgtra+ℒimgtes,\begin{gathered}\mathcal{L}^{tra}_{img}=\textit{MSE}(\mathcal{I}_{co},\mathcal{I}^{tra}_{w})=\textit{MSE}(\mathcal{I}_{co},\mathcal{E}(\theta_{e},\mathcal{I}_{co},\mathcal{M}_{en})),\\ \mathcal{L}^{tes}_{img}=\textit{MSE}(\mathcal{I}_{co},\mathcal{I}^{tes}_{w})=\textit{MSE}(\mathcal{I}_{co},\mathcal{E}^{\prime}(\theta^{\prime}_{e},\mathcal{I}_{co},\mathcal{M}_{en})),\\ \mathcal{L}_{img}=\mathcal{L}^{tra}_{img}+\mathcal{L}^{tes}_{img},\end{gathered} (3) here, ℐwtra\mathcal{I}^{tra}_{w} and ℐwtes\mathcal{I}^{tes}_{w} are the watermarked image generated by the main encoder ℰ(θe,⋅)\mathcal{E}(\theta_{e},\cdot) and the temporary encoder ℰ′(θe′,⋅)\mathcal{E}^{\prime}(\theta^{\prime}_{e},\cdot), respectively. Ultimately, we aggregated ℒmeta-train\mathcal{L}_{meta\text{-}train}, ℒmeta-test\mathcal{L}_{meta\text{-}test}, and ℒimg\mathcal{L}_{img} through a weighted combination to update the parameters of the main model, achieving joint optimization of imperceptibility and robustness. The total loss function ℒtotal\mathcal{L}_{total} is defined as: ℒtotal=λ1⋅ℒmeta-train+ℒmeta-testm+1+λ2⋅ℒimg2,\begin{gathered}\mathcal{L}_{total}=\lambda_{1}\cdot\frac{\mathcal{L}_{meta\text{-}train}+\mathcal{L}_{meta\text{-}test}}{m+1}+\lambda_{2}\cdot\frac{\mathcal{L}_{img}}{2},\end{gathered} (4) where λ1\lambda_{1} and λ2\lambda_{2} are hyperparameters that balance the robustness and imperceptibility. Throughout the training process, the weighting factors λ1\lambda_{1} and λ2\lambda_{2} are dynamically adjusted, similar to [33]. In the early training stages, the model focuses on learning robust watermark decoding. Therefore, λ1\lambda_{1} is initialized to 55, and λ2\lambda_{2} is initialized to a relatively low value of 11. This design helps stabilize optimization since the embedded watermark signal is inherently weaker than the image content. As training progresses and the model achieves stable watermark recovery under various distortions, λ1\lambda_{1} is gradually decreased to its predefined minimum value of 11, while λ2\lambda_{2} is progressively increased to its maximum value of 1515 to further improve the visual quality of the watermarked images. Algorithm 1 Meta-FC 0: cover images ℐco\mathcal{I}_{co}, watermark message ℳen\mathcal{M}_{en}, watermark encoder ℰ\mathcal{E}, watermark decoder 𝒟\mathcal{D}, noise pool 𝒩\mathcal{N} including m+1m+1 distortions. 0: best model parameters. 1: Initialization; 2: for k∈[0,maxiter]k\in[0,maxiter] do 3: Randomly sample mm distortions from the noise pool; 4: // Meta-Train 5: ℐwtra←ℰ(ℐco,ℳen)\mathcal{I}^{tra}_{w}\leftarrow\mathcal{E}(\mathcal{I}_{co},\mathcal{M}_{en}); 6: for i∈[1,m]i\in[1,m] do 7: ℐtrai←𝒩(ℐwtra)\mathcal{I}^{i}_{tra}\leftarrow\mathcal{N}(\mathcal{I}_{w}^{tra}); 8: ℳtrai←𝒟(ℐtrai)\mathcal{M}^{i}_{tra}\leftarrow\mathcal{D}(\mathcal{I}^{i}_{tra}); 9: end for 10: Compute ℒw,n\mathcal{L}_{w,n} with Eq.(5&\&6); 11: Compute ℒmeta-train\mathcal{L}_{meta\text{-}train} and ℒmsgtra\mathcal{L}^{tra}_{msg} with Eq.(1); 12: Inner update and get temporary model ℰ′\mathcal{E}^{\prime} and 𝒟′\mathcal{D}^{\prime}; 13: Sample the remaining distortion from the noise pool; 14: // Meta-Test 15: ℐwtes←ℰ′(ℐco,ℳen)\mathcal{I}^{tes}_{w}\leftarrow\mathcal{E}^{\prime}(\mathcal{I}_{co},\mathcal{M}_{en}); 16: ℐtes←𝒩(ℐwtes)\mathcal{I}_{tes}\leftarrow\mathcal{N}(\mathcal{I}^{tes}_{w}); 17: ℳtes←𝒟′(ℐtes)\mathcal{M}_{tes}\leftarrow\mathcal{D}^{\prime}(\mathcal{I}_{tes}); 18: Compute ℒmeta-test\mathcal{L}_{meta\text{-}test} with Eq.(2); 19: // Outer update 20: Compute ℒimg\mathcal{L}_{img} and ℒtotal\mathcal{L}_{total} with Eq.(3&\&4); 21: Update θe\theta_{e} and θd\theta_{d}; 22: end for 23: Return: best model parameters. Figure 3: Result of the visual quality of SRD and our method under different models. The first row presents the cover image, followed by the watermarked images in the second row. The third row shows the watermarked images subjected to various distortions. The fourth row illustrates the residuals, which represent the difference between the watermarked and cover images and are magnified by a factor of 5 to enhance visibility. The final two rows report the PSNR(dB) and SSIM values of each model, respectively, where consistent visual quality is maintained across training methods for the same model. 3.3 Feature Consistency Loss Although meta-learning helps select parameters that are robust across diverse distortions, it remains limited in improving the representational capacity of the model itself. To enhance distortion-invariant representations, we introduce a feature consistency loss that aligns the decoder features between ℐwtra\mathcal{I}^{tra}_{w} and ℐtrai\mathcal{I}^{i}_{tra}. The core intuition is that the decoder features extracted from ℐwtra\mathcal{I}^{tra}_{w} typically preserve more complete and reliable watermark information. By encouraging the features from ℐtrai\mathcal{I}^{i}_{tra} to match those of ℐwtra\mathcal{I}^{tra}_{w}, the model learns to extract watermark-relevant representations that are more resilient to distortions. Let fwf_{w} denote the last layer decoder feature vector extracted from ℐwtra\mathcal{I}^{tra}_{w}, and let fnoif_{no}^{i} denote the last layer decoder feature extracted from ℐtrai\mathcal{I}^{i}_{tra}. These feature vectors are normalized as follows: f¯w=fw‖fw‖2,f¯noi=fnoi‖fnoi‖2.\begin{gathered}\bar{f}_{w}=\frac{f_{w}}{\|f_{w}\|_{2}},\qquad\bar{f}_{no}^{i}=\frac{f_{no}^{i}}{\|f_{no}^{i}\|_{2}}.\end{gathered} (5) Using f¯w\bar{f}_{w} as anchor, we quantify the distance between f¯w\bar{f}_{w} and f¯noi\bar{f}_{no}^{i} through cosine similarity: ℒw,n=∑i=1m(1−cos⁡(f¯w,f¯noi))=∑i=1m(1−⟨f¯w,f¯noi⟩).\begin{gathered}\mathcal{L}_{w,n}=\sum_{i=1}^{m}\left(1-\cos(\bar{f}_{w},\bar{f}_{no}^{i})\right)=\sum_{i=1}^{m}\left(1-\langle\bar{f}_{w},\bar{f}_{no}^{i}\rangle\right).\end{gathered} (6) By minimizing this loss, the decoded features from all distortions are encouraged to converge toward a consistent watermark representation, thereby enhancing the model’s robustness to both high-intensity and combined distortions. 4 Experiments 4.1 Experiments Settings 4.1.1 Setting Details. In this paper, all experiments are implemented using PyTorch [5] and executed on an NVIDIA GeForce RTX 3090 GPU. The models are trained in DIV

Meta AI (FAIR)

Papers on Lattice

Total citations

Topics

Research focus

Computer Vision (1)Red-Teaming & Adversarial Robustness (1)Training Efficiency & Optimization (1)

Frequent co-authors

Chengcheng Zhu (1)Chunpeng Ge (1)

Papers (1)

Feb 25, 2026

Meta AIFeb 25, 2026·also K [1] and use the DIV

Meta-FC: Meta-Learning with Feature Consistency for Robust and Generalizable Watermarking

Randomly throwing distortions at your watermarking model during training? Meta-FC shows meta-learning a better way, boosting robustness by up to 4.71% against combined distortions.

Weitong Chen, Chengcheng Zhu, Chunpeng Ge

Computer Vision Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Search

Weitong Chen

Research focus

Frequent co-authors

Papers (1)