SEAL: Entangled White-box Watermarks on Low-Rank Adaptation
Abstract
Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal white-box watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglment, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We demonstrate that SEAL is robust against a variety of known attacks: removal, obfuscation, and ambiguity attacks.
1 Introduction
Recent years have witnessed an increasing demand for protecting deep neural networks (DNNs) as intellectual properties (IPs), mainly due to the significant cost of collecting quality data and training DNNs on it. In response, researchers have proposed various DNN watermarking methods for DNN copyright protection (Uchida et al., 2017; Darvish Rouhani et al., 2019; Zhang et al., 2018; Fan et al., 2019; Zhang et al., 2020; Xu et al., 2024; Lim et al., 2022), which work by secretly embedding identity messages into the DNNs during training. The IP holders can present the identity messages to a verifier in the event of a copyright dispute to claim ownership.

Meanwhile, recent Parameter Efficient FineTuning (PEFT) (Ding et al., 2023) strategies, particularly Low-Rank Adaptation (LoRA) (Hu et al., 2022), have revolutionized how many domain-specific DNNs - especially Large Language Models (LLMs) (AI@Meta, 2024; Jiang et al., 2023; Team et al., 2024; Yang et al., 2024) and Diffusion Models (DMs) (Rombach et al., 2022) - are built and shared. LoRA’s efficacy stems from its lightweight adaptation layers, which introduce no additional inference overhead while preserving similar performance to fully fine-tuned models (Zhao et al., 2024; Jang et al., 2024; Mangrulkar et al., 2022). These qualities have led to a surge in open-source adaptation, as evidenced by more than 100k publicly shared LoRA weights on platforms such as Hugging Face, Civit AI (Luo et al., 2024). In addition, several variants such as QLoRA (Dettmers et al., 2024), LoRA+ (Hayou et al., 2024), and DoRA (Liu et al., 2024b) have emerged to further optimize resource usage and boost efficient domain adaptation. Due to these factors, LoRA’s training framework has been established as the de facto approach in open-source communities for customizing large pretrained models to domain-specific tasks.
Although LoRA-based methods rely on pretrained foundation models, their uniquely trained adaptation weights themselves represent valuable IP that merits protection. Unfortunately, existing white-box DNN watermarking schemes are not suitable for LoRA structure where weights are commonly released in open source, as they only support embedding identity messages in specific architecture-bounded components, such as kernels in convolutional layers (Uchida et al., 2017; Liu et al., 2021; Zhang et al., 2020; Lim et al., 2022). By contrast, some approaches focus on utilizing LoRA to protect the original pretrained weights rather than safeguarding the LoRA itself (Feng et al., 2024), exposing these methods’ failure to address LoRA’s unique properties.
To address this gap, we propose SEAL, the universal watermarking scheme designed to protect the copyright of LoRA weights. The key insight of SEAL is the integration of constant matrices, passports, between the LoRA weight, acting as a hidden identity message that is difficult to extract, remove, modify, or even counterfeit, thus offering robust IP protection. During fine-tuning, these passports naturally direct gradients through themselves, eliminating the need for additional constraint losses. After fine-tuning, SEAL seamlessly decomposes the passport into two parts, each integrated into the up and down blocks, ensuring the final model is structurally indistinguishable from LoRA weights.
We validate our SEAL against an array of attacks—removal (Han et al., 2016), obfuscation (Yan et al., 2023; Pegoraro et al., 2024), and ambiguity (Fan et al., 2019)—demonstrating that any attempt to remove or disrupt the passport severely degrades model performance. SEAL imposes no performance degradation on the host task; in many cases, it matches or even surpasses the fidelity of standard LoRA weights across various tasks.
In summary, our contributions are three-fold:
-
1.
Simple yet Strong Copyright Protection for LoRA. We present SEAL, the universial watermarking scheme for protecting LoRA weights by embedding a hidden identity message using a constant matrix, passport, eliminating the need for additional loss terms, offering a straightforward yet robust solution.
-
2.
No Performance Degradation. We demonstrate applying SEAL does not degrade the performance of the host task. In practice, SEAL consistently achieves performance comparable to or even exceeding that of standard LoRA.
-
3.
Robustness Against Attacks. We demonstrate SEAL’s resilience against various attacks, including removal, obfuscation, and ambiguity attacks, maintaining robust IP protection under severe adversarial conditions.
2 Preliminary
2.1 Low-Rank Adaptation
LoRA (Hu et al., 2022) assumes that task-specific updates lie in a low-rank subspace of the model’s parameter space. It freezes the pretrained weights and trains two low-rank matrices and . After training, the adapted weights are:
(1) |
Because there are no activation functions between and , one can simply add to for efficient integration into the pretrained model.
2.2 White-box DNN Watermarks
White-box DNN watermarking techniques can be broadly categorized based on the location of secret embedding or verification:
- •
- •
- •
-
•
Passport-based. This line of work inserts an additional linear or normalization layer (passport layer) into the model, so that using the correct passport yields normal performance, while invalid passports degrade the accuracy. During ownership verification, the legitimate passport is presented to confirm the model’s fidelity, effectively distinguishing rightful owners from adversaries (Fan et al., 2019; Zhang et al., 2020).
Unlike weight-, activation-, or output-based methods, passport-based watermarking ties model performance to hidden parameters (passports). It does not require special triggers or depend solely on model outputs. Instead, ownership is verified by a passport that restores high accuracy, securely linking model weights and the embedded secret.
2.3 Threat Model and Evaluation Criteria
Attack Types.
Under a white-box setting, adversaries are assumed to have full access to the model weights. We idntify three primary attack categories:
(1) Removal, where attackers prune unimportant parameters (LeCun et al., 1989; Han et al., 2016; Uchida et al., 2017; Darvish Rouhani et al., 2019), finetune the model without watermark constraints (Chen et al., 2021; Guo et al., 2021).
(2) Obfuscation, where attackers restructure the architecture to disrupt watermark extraction (Yan et al., 2023; Pegoraro et al., 2024; Li et al., 2023a), all while preserving model functionality equivalently.
(3) Ambiguity, where counterfeit keys or watermarks are forged to deceive verifiers into believing the adversary is the rightful owner (Fan et al., 2019; Zhang et al., 2020; Chen et al., 2023).
Adversary Assumptions. We assume the adversary obtains the open-sourced weights but lacks have access to the original finetuning data and is unable to retrain from scratch. Consequently, they aim to preserve the model’s performance, as excessive degradation would nullify its value. While they know our watermarking scheme (Kerckhoff’s principle), the secret key itself remains undisclosed.
Evaluation Criteria. A robust DNN watermark should satisfy two requirements (Uchida et al., 2017): Fidelity, meaning the watermark does not degrade the model’s original performance; and Robustness, ensuring the watermark resists removal, obfuscation, and ambiguity attacks.
3 SEAL: The Watermarking Scheme

For clarity, the symbols used throughout this section are listed in Table 6.
3.1 Impact of the Constant Matrix between LoRA
In Fig. 2, we compare the distributions of negative log singular values, , from a standard LoRA model , , against our proposed SEAL, , approach on multiple models: Llama-2-7B/13B (Touvron et al., 2023), Mistral-7B-v0.1 (Jiang et al., 2023), and Gemma-2B (Team et al., 2024). For each trained model, we reconstruct the learned weight , collect the top-32 singular values from each module, and plot in a cumulative distribution function (CDF).
We observe that the SEAL curves systematically shift to the right compared to LoRA. This shift implies that the learned subspace under SEAL is more evenly spread across multiple singular directions, rather than being dominated by just a few large singular values. Such broad coverage in the singular spectrum can bolster robustness: altering or removing the watermark in one direction has a limited effect, as the watermark is “spread out” in multiple directions. Further gradient-based analyses are provided in Appendix B.
3.2 Comparison with Existing Passport Methods
Unlike prior passport-based methods (Fan et al., 2019; Zhang et al., 2020) that typically introduce an additional loss term (a regularization or constraint to embed the passport) and keep the passport layer trainable, SEAL employs a non-trainable matrix inserted directly into LoRA’s block, eliminating the need for auxiliary loss terms. Consequently, our approach differs from existing methods on two key fronts—no extra loss and a non-trainable passport—making a one-to-one comparison problematic.
3.3 Entangling Passports during Training
SEAL embeds the watermark during training by inserting the non-trainable, constant matrix between the trainable parameters and . Doing so effectively entangles the given passport with and . The concept of entanglement is superficially similar to the entanglement proposed by Jia et al. It involves indistinguishable distributions between host and watermarked tasks. In our context, we define entanglement as follows.
Definition 3.1 (Entanglement).
Given trainable parameters and , and a non-trainable parameter , and are in entanglement via if and only if they produce the correct output for the host task when is present between them.
As despicted Alg. 1, directly influences the computations of and during the forward pass, and modifies the gradient flow in the backward pass, thereby embedding itself through a normal training process. The IP holder incorporates both and during training, alternating them according to the batch size.
3.4 Hiding Passport for Distribution
After successfully establishing the entanglement between the passport and other trainable parameters, the passport must be hidden before distribution. Therefore, we decompose the passport, , of the IP holder into two matrices such that their product reconstructs , as shown in Fig. 1.
Definition 3.2 (Decomposition Function).
For a given constant , a function is a decomposition function of where such that .
The decomposition function ensures that models trained with SEAL, which contain three matrices per layer, , can be distributed in a form that resembles standard LoRA implementations with only two matrices, . In the decomposition process, the IP holder can camouflage the passport within the open-sourced weight by distributing its decomposed components into and .
An example of decomposition using SVD is
where . Using SVD decomposition function, , the resulting component of is
We will use as the default decomposition function unless otherwise specified. Notably, is not distributed into either or .
3.5 Extraction on Embedded Passport
To extract the embedded passport from LoRA converted SEAL weight, , we have to assume that and , which are trained SEAL weights, are full rank matrices.
Assumption 3.3 (Rank of trained SEAL weights).
Trained SEAL weights and are full rank matrices with .
By Assumption 3.3, and have the pseudo-inverse , such that , where is the identity matrix. As shown in Alg. 2, the method for extracting the passport from is multiplying , in the right/left side of , respectively. Thus, only the legitimate owner, who has original SEAL weights and , can extract the concealed passport, , from .
3.6 Passport-based Ownership Verification
3.6.1 Extraction
Yan et al. demonstrate that it is possible to neutralize the extraction process of watermarking schemes by altering the distribution of parameters while maintaining functional invariance. Given that the adversary is aware of SEAL and we assume a white-box scenario, the adversary could generate the triplet for the verifier during the extraction process such that
In this process, even if , which is the truly distributed passport, the verifier could be confused about who the legitimate owner is. For this reason, extraction should only be used when the legitimate owner is attempting to verify whether their passport is embedded in a suspected model. It should not be relied upon in scenarios where a third-party verifier is required for a contested model, as it is vulnerable in such cases.
Therefore, it is crucial to leverage the inherent characteristic of passport-based schemes—where performance degradation occurs if the correct passport is not presented—allowing a third-party verifier to determine the legitimate owner accurately (Fan et al., 2019).
3.6.2 Measuring Fidelity
Recall that SEAL entails two passports, , both entangled with the LoRA weights . To gauge how similarly these two passports preserve the model’s performance, we define a fidelity gap:
where is the task-specific metric for the adaptation layer on task . A small indicates that and yield near-identical performance, implying they were jointly entangled during training. By contrast, if two passports are not entangled with , switching between them would degrade the model’s accuracy, producing a large .
In a legitimate setting, the owner’s should incur almost no performance difference ( close to zero). An attacker forging a second passport, however, cannot maintain the same fidelity gap without retraining the entire LoRA model. Hence, naturally serves as a verification criterion for rightful ownership. Detailed formulation is in Sec. 3.6.3
Method | BoolQ | PIQA | SIQA | HellaSwag | Wino. | ARC-e | ARC-c | OBQA | Avg. ↑ | |
---|---|---|---|---|---|---|---|---|---|---|
LLaMA-2-7B | LoRA | 73.75 | 82.99 | 79.85 | 86.14 | 85.06 | 86.15 | 73.63 | 85.80 | 81.67 |
SEAL (Ours) | 72.70 | 85.27 | 81.27 | 90.15 | 85.79 | 87.07 | 74.60 | 85.00 | 82.73 | |
SEAL† (Ours) | 73.19 | 86.31 | 81.95 | 91.21 | 86.69 | 88.55 | 75.51 | 86.80 | 83.78 | |
LLaMA-2-13B | LoRA | 75.57 | 86.98 | 81.39 | 91.82 | 88.53 | 90.08 | 78.78 | 86.67 | 84.98 |
SEAL (Ours) | 75.34 | 87.41 | 83.28 | 93.33 | 88.42 | 90.68 | 79.61 | 86.73 | 85.60 | |
SEAL† (Ours) | 75.67 | 88.63 | 83.21 | 93.95 | 89.29 | 91.72 | 81.46 | 88.53 | 86.56 | |
LLaMA-3-8B | LoRA | 74.76 | 88.22 | 80.96 | 92.00 | 86.08 | 90.09 | 82.41 | 86.30 | 85.10 |
SEAL (Ours) | 73.88 | 88.23 | 82.29 | 94.84 | 88.35 | 91.67 | 82.00 | 86.27 | 85.94 | |
SEAL† (Ours) | 75.78 | 90.37 | 83.25 | 96.05 | 89.92 | 93.49 | 84.73 | 90.60 | 88.02 | |
Gemma-2B | LoRA | 67.05 | 83.19 | 77.26 | 87.07 | 79.74 | 83.91 | 69.34 | 79.87 | 78.43 |
SEAL (Ours) | 66.56 | 81.79 | 77.65 | 84.82 | 79.16 | 82.79 | 68.40 | 79.20 | 77.55 | |
SEAL† (Ours) | 66.70 | 82.50 | 78.88 | 87.57 | 80.19 | 83.81 | 69.97 | 79.87 | 78.68 | |
Mistral-7B-v0.1 | LoRA | 75.92 | 90.72 | 81.78 | 94.68 | 88.69 | 93.10 | 83.36 | 88.30 | 87.07 |
SEAL (Ours) | 73.08 | 87.52 | 81.92 | 91.23 | 87.97 | 90.19 | 78.70 | 88.13 | 84.84 | |
SEAL† (Ours) | 76.92 | 90.42 | 82.51 | 94.57 | 90.08 | 93.31 | 83.25 | 91.73 | 87.85 |
3.6.3 Verification
The fundamental idea behind passport-based watermarking is that any forged passport significantly degrades the model’s performance (Fan et al., 2019), resulting in a fidelity gap . As shown in Alg. 3, the suspected model is first checked against the claimant’s to ensure they reconstruct the same adaptation weights. If so, we measure between and (the two passports) via the task metric . Def. 3.4 then concludes that ownership is verified if and only if :
Definition 3.4 (Verification Process).
Assume . Define
In practice, the legitimate owner can submit , achieving . Any adversary forging lacks the entanglement from training, and fails to keep within the threshold. See Appendix C for an extended discussion.
4 Experiments
4.1 Experimental Setup
4.1.1 Fidelity
To demonstrate that the performance of models after embedding SEAL passports does not degrade, we conducted experiments across both language and image modalities. Initially, we evaluate our model by comparing it with various open-source Large Language Models (LLMs) such as LLaMA-2-7B/13B (Touvron et al., 2023), LLaMA-3-8B (AI@Meta, 2024), Gemma-2B (Team et al., 2024), and Mistral-7B-v0.1 (Jiang et al., 2023) on commonsense reasoning tasks. Next, we verify the model’s effectiveness on instruction tuning tasks. Following this, we extend our approach to the Vision Language Model (VLM) (Liu et al., 2024a) by evaluating the model’s performance on visual instruction tuning. Finally, we assess SEAL’s capabilities on image-generative tasks (Rombach et al., 2022).
4.1.2 Robustness
We evaluated the robustness of SEAL against removal, obfuscation and ambiguity attacks by evaluating fidelity scores in commonsense reasoning tasks. For removal and obfuscation attacks, the presence of the extracted watermark was confirmed through hypothesis testing. For ambiguity attacks, fidelity scores were used to verify genuine versus counterfeit passports, as defined in Def. 3.4.
4.2 Commonsense Reasoning Task
Table 1 presents the performance comparison across commonsense reasoning tasks: BoolQ (Clark et al., 2019), PIQA (Bisk et al., 2020), SIQA (Sap et al., 2019), HellaSwag (Zellers et al., 2019), Wino. (Sakaguchi et al., 2021), ARC-e, ARC-c (Clark et al., 2018), and OBQA (Mihaylov et al., 2018). The dataset combines multiple sources, as detailed in (Hu et al., 2023). We train LLMs on 3-epochs on the combined dataset. The experimental results emphasize that SEAL can be seamlessly integrated into existing LoRA architectures, without affecting performance degradation.
Task | Inst. Tune | Text-to-Image | |||
Textual | Visual | ||||
Metric | MT-B | Acc. | CLIP-T | CLIP-I | DINO. |
LoRA | 5.83 | 66.9 | 0.20 | 0.80 | 0.68 |
SEAL | 5.81 | 63.1 | 0.20 | 0.80 | 0.67 |
4.3 Textual Instruction Tuning
Table 2 shows the scores for LLaMA-2-7B, instruction tuned with both LoRA and SEAL, using Alpaca dataset (Taori et al., 2023) with 3-epochs. The scores are averaged ratings given by gpt-4-0613 on a scale of 1 to 10 for the models’ responses to questions from MT-Bench (Zheng et al., 2023). Since the Alpaca dataset is optimized for single-turn interactions, the average score for single-turn performance from MT-Bench is used. The results indicate that SEAL achieves performance comparable to LoRA, thereby confirming its fidelity.
4.4 Visual Instruction Tuning
Table 2 shows the average performance across seven visual instruction tuning benchmarks (Goyal et al., 2017; Hudson & Manning, 2019; Gurari et al., 2018; Lu et al., 2022; Singh et al., 2019; Li et al., 2023b; Liu et al., 2023) for LoRA and SEAL on LLaVA-1.5 (Liu et al., 2024a) with detailed elaboration in Appendix 10. As shown in Table 2, the performance of SEAL is comparable to that of LoRA.
4.5 Text-to-Image Synthesis
The experimentation with the Stable Diffusion model (Rombach et al., 2022) in conjunction with the dataset of DreamBooth (Ruiz et al., 2023) trained with LoRA elucidates the versatility SEAL when integrated into diverse architectures. Table 2 provides a detailed comparison of subject fidelity, CLIP-I (Radford et al., 2021), DINO. (Caron et al., 2021), and prompt fidelity, CLIP-T, using the methods employed in (Nam et al., 2024). Our results confirm that SEAL maintains high fidelity and prompt accuracy without any degradation in model performance.

4.6 Integrating with LoRA Variants
Method | Wall Time (h) | Avg. |
---|---|---|
LoRA | 12.0 | 81.67 |
DoRA | 18.5 | 81.98 |
SEAL | 19.6 | 83.78 |
SEAL + DoRA | 27.8 | 81.88 |
Thanks to its flexible framework, SEAL can easily be applied to a wide variety of LoRA variants. In Table 3, we use DoRA (Liu et al., 2024b) as a case study to demonstrate that SEAL can seamlessly integrate with diverse LoRA-based methods, as exemplified by SEAL+DoRA.
Even without any hyperparameter optimization, SEAL +DoRA matches accuracy of DoRA, highlighting that these variants can coexist with SEAL in a single pipeline without interference. Further details on how SEAL applies to other LoRA variants, including matmul-based and other multiplicative approaches (Edalati et al., 2022; Hyeon-Woo et al., 2021), can be found in Appendix D and E.
Tasks | Acc. | MT-B | p-value |
---|---|---|---|
83.1 | - | - | |
- | 5.81 | - | |
60.2 | 4.94 | ||
0.24 | 3.56 | ||
82.9 | - | ||
- | 3.78 |
4.7 Pruning Attack
Pruning attacks were performed on trained SEAL weights by zeroing out based on their L1 norms. And we extract passport, , on pruned weight. We used statistical testing instead of Bit Error Rate (BER) because, unlike prior work (Uchida et al., 2017; Fernandez et al., 2024; Zhang et al., 2020; Feng et al., 2024) that used a small number of bits, , the amount of our passport bits is approximately , necessitating a different approach. In hypothesis testing, if the p-value is smaller than our significance level ( = 0.0005), we reject the null hypothesis, the extracted watermark is an irrelevant matrix with . Rejecting the hypothesis implies that the extracted watermark is not random noise but exists within the model.
Fig. 3 illustrates the fidelity score and obtained by zeroing the smallest parameters of , based on L1 norms. The results demonstrate that removing the watermark necessitates zeroing 99.9% of the weights, which severely impacts the host task’s performance, thereby confirming SEAL’s robustness against pruning attacks.

4.8 Finetuning Attack
In this experiment, we aimed to assess the robustness of SEAL’s watermark under finetuning attacks. The notation represents a task fine-tuned for epochs. Specifically, we resumed training on SEAL weights, , that had been trained for 3 epochs on two tasks: commonsense reasoning () and instruction tuning with Alpaca dataset (Taori et al., 2023) (). The notation represents post-finetuning with the respective dataset for 1 additional epoch using standard LoRA training, where and are the trainable parameters.
These finetuning scenarios were designed to simulate an adversarial attack, where the model is fine-tuned either on the original or a different dataset, such as finetuning on Alpaca for one epoch () or on commonsense reasoning for one epoch (). After finetuning, we evaluated the robustness of the embedded watermark by extracting it and measuring the p-value. The results demonstrated a p-value significantly lower than 5e-4, with , indicating the passport remains detectable.
4.9 Structural Obfuscation Attack.
Structural obfuscation attacks target the structure of DNN models while maintaining their functionality (Yan et al., 2023; Pegoraro et al., 2024). In the case of LoRA, an adversary cannot change the input dimension or the output dimension , but they can modify the rank of the matrices and . However, even if is changed, remains functionally equivalent to , ensuring the distributed passport remains detectable. To mitigate the effects of structural obfuscation with minimal impact on the host task, we decompose using SVD and modify it based on its singular values, sorting by large singular values and discarding the smaller ones, resulting in .
Fig. LABEL:fig:svd_obfuscation shows the results of performing structural obfuscation via SVD. The original rank is 32, and the results are obfuscated from rank 31 down to 1. The fidelity score remains unchanged, and the passport is still detectable, demonstrating SEAL’s robustness against structural obfuscation attacks.
4.10 Ambiguity Attack
Model | |||
---|---|---|---|
LLaMA-2-7B | 82.2 | 82.7 | 0.5 |
Mistral-7B-v0.1 | 84.2 | 87.9 | 3.7 |
Gemma-2B | 76.3 | 76.6 | 0.3 |
In the context of SEAL, ambiguity attacks pose a significant threat when an adversary attempts to create counterfeit passports that can bypass the verification process by generating functionally equivalent weights. Even under the worst-case assumption that the adversary successfully separates into , they must generate another passport, , to form the required quadruplet for the verification by Def. 3.4.
Table 5 provides the verification thresholds . As depicted in Fig. 4, the adversary would need to generate a counterfeit passport that is more than 60% similar to to avoid a significant drop below . Given the concealed nature of , achieving this level of similarity is practically impossible, which highlights the effectiveness of our approach in maintaining the security of the ownership verification process. In conclusion, SEAL significantly reduces the risk of ambiguity attacks by ensuring that counterfeit passports generated without knowledge of are unlikely to maintain the required fidelity score.
5 Conclusion
We introduced SEAL, a novel watermarking scheme specifically tailored for LoRA weights. By inserting a constant matrix during LoRA training and factorizing it afterward, our approach enables robust ownership verification without impairing the model’s performance. Empirical results on commonsense reasoning, instruction tuning, and text-to-image tasks confirm both high fidelity and strong resilience against removal, obfuscation, and ambiguity attacks.
Although our experiments focus on LoRA, the core idea—using a non-trainable matrix to entangle trainable parameters—may extend to other parameter-efficient finetuning (PEFT) methods or larger foundation models. Future work will explore generalized forms of this embedding mechanism, aiming to protect a broader range of adaptation techniques while maintaining minimal overhead.
Impact Statement
Our scheme helps content creators and organizations safeguard intellectual property in lightweight, easily distributed LoRA-based models. This fosters more open collaboration in AI communities by alleviating concerns about unauthorized use or redistribution of finetuned checkpoints.
However, no defense is fully immune to new adversarial strategies, and watermarking could be misused to embed covert or unethical content. We thus advocate for transparent guidelines and continuous evaluation to ensure that watermarking remains a fair and dependable approach for protecting intellectual property in open-source AI.
References
- AI@Meta (2024) AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md.
- Bisk et al. (2020) Bisk, Y., Zellers, R., Gao, J., Choi, Y., et al. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 7432–7439, 2020.
- Caron et al. (2021) Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., and Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Chen et al. (2021) Chen, X., Wang, W., Bender, C., Ding, Y., Jia, R., Li, B., and Song, D. Refit: A unified watermark removal framework for deep learning systems with limited data. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CCS ’21, pp. 321–335, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450382878. doi: 10.1145/3433210.3453079. URL https://doi.org/10.1145/3433210.3453079.
- Chen et al. (2023) Chen, Y., Tian, J., Chen, X., and Zhou, J. Effective ambiguity attack against passport-based dnn intellectual property protection schemes through fully connected layer substitution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8123–8132, 2023.
- Clark et al. (2019) Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
- Clark et al. (2018) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
- Darvish Rouhani et al. (2019) Darvish Rouhani, B., Chen, H., and Koushanfar, F. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems, pp. 485–497, 2019.
- Dettmers et al. (2024) Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
- Ding et al. (2023) Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C.-M., Chen, W., et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
- Edalati et al. (2022) Edalati, A., Tahaei, M., Kobyzev, I., Nia, V. P., Clark, J. J., and Rezagholizadeh, M. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650, 2022.
- Fan et al. (2019) Fan, L., Ng, K. W., and Chan, C. S. Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks. Advances in neural information processing systems, 32, 2019.
- Feng et al. (2024) Feng, W., Zhou, W., He, J., Zhang, J., Wei, T., Li, G., Zhang, T., Zhang, W., and Yu, N. Aqualora: Toward white-box protection for customized stable diffusion models via watermark lora. In Forty-first International Conference on Machine Learning, 2024.
- Fernandez et al. (2023) Fernandez, P., Couairon, G., Jégou, H., Douze, M., and Furon, T. The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22466–22477, 2023.
- Fernandez et al. (2024) Fernandez, P., Couairon, G., Furon, T., and Douze, M. Functional invariants to watermark large transformers. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4815–4819. IEEE, 2024.
- Goyal et al. (2017) Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6904–6913, 2017.
- Guo et al. (2021) Guo, S., Zhang, T., Qiu, H., Zeng, Y., Xiang, T., and Liu, Y. Fine-tuning is not enough: A simple yet effective watermark removal attack for dnn models. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2021.
- Gurari et al. (2018) Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., and Bigham, J. P. Vizwiz grand challenge: Answering visual questions from blind people. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3608–3617, 2018.
- Han et al. (2016) Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations, 2016.
- Hayou et al. (2024) Hayou, S., Ghosh, N., and Yu, B. Lora+: Efficient low rank adaptation of large models. In Forty-first International Conference on Machine Learning, 2024.
- Hu et al. (2022) Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
- Hu et al. (2023) Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E.-P., Bing, L., Xu, X., Poria, S., and Lee, R. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 5254–5276, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.319. URL https://aclanthology.org/2023.emnlp-main.319.
- Hudson & Manning (2019) Hudson, D. A. and Manning, C. D. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6700–6709, 2019.
- Hyeon-Woo et al. (2021) Hyeon-Woo, N., Ye-Bin, M., and Oh, T.-H. Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, 2021.
- Jang et al. (2024) Jang, U., Lee, J. D., and Ryu, E. K. Lora training in the ntk regime has no spurious local minima. arXiv preprint arXiv:2402.11867, 2024.
- Jia et al. (2021) Jia, H., Choquette-Choo, C. A., Chandrasekaran, V., and Papernot, N. Entangled watermarks as a defense against model extraction. In 30th USENIX security symposium (USENIX Security 21), pp. 1937–1954, 2021.
- Jiang et al. (2023) Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Kirchenbauer et al. (2024) Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. On the reliability of watermarks for large language models. In The Twelfth International Conference on Learning Representations, 2024.
- Kopiczko et al. (2024) Kopiczko, D. J., Blankevoort, T., and Asano, Y. M. Vera: Vector-based random matrix adaptation. In The Twelfth International Conference on Learning Representations, 2024.
- LeCun et al. (1989) LeCun, Y., Denker, J., and Solla, S. Optimal brain damage. Advances in neural information processing systems, 2, 1989.
- Li et al. (2023a) Li, F.-Q., Wang, S.-L., and Liew, A. W.-C. Linear functionality equivalence attack against deep neural network watermarks and a defense method by neuron mapping. IEEE Transactions on Information Forensics and Security, 18:1963–1977, 2023a.
- Li et al. (2023b) Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X., and Wen, J.-R. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023b.
- Lim et al. (2022) Lim, J. H., Chan, C. S., Ng, K. W., Fan, L., and Yang, Q. Protect, show, attend and tell: Empowering image captioning models with ownership protection. Pattern Recognition, 122:108285, 2022.
- Liu et al. (2021) Liu, H., Weng, Z., and Zhu, Y. Watermarking deep neural networks with greedy residuals. In ICML, pp. 6978–6988, 2021.
- Liu et al. (2024a) Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tuning. Advances in neural information processing systems, 36, 2024a.
- Liu et al. (2024b) Liu, S.-Y., Wang, C.-Y., Yin, H., Molchanov, P., Wang, Y.-C. F., Cheng, K.-T., and Chen, M.-H. DoRA: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024b.
- Liu et al. (2023) Liu, Y., Duan, H., Zhang, Y., Li, B., Zhang, S., Zhao, W., Yuan, Y., Wang, J., He, C., Liu, Z., et al. Mmbench: Is your multi-modal model an all-around player? arXiv preprint arXiv:2307.06281, 2023.
- Loshchilov & Hutter (2019) Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Lu et al. (2022) Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., Tafjord, O., Clark, P., and Kalyan, A. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, 35:2507–2521, 2022.
- Luo et al. (2024) Luo, M., Wong, J., Trabucco, B., Huang, Y., Gonzalez, J. E., Chen, Z., Salakhutdinov, R., and Stoica, I. Stylus: Automatic adapter selection for diffusion models. arXiv preprint arXiv:2404.18928, 2024.
- Mangrulkar et al. (2022) Mangrulkar, S., Gugger, S., Debut, L., Belkada, Y., Paul, S., and Bossan, B. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
- Mihaylov et al. (2018) Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018.
- Nam et al. (2024) Nam, J., Kim, H., Lee, D., Jin, S., Kim, S., and Chang, S. Dreammatcher: Appearance matching self-attention for semantically-consistent text-to-image personalization, 2024.
- Pegoraro et al. (2024) Pegoraro, A., Segna, C., Kumari, K., and Sadeghi, A.-R. Deepeclipse: How to break white-box dnn-watermarking schemes. arXiv preprint arXiv:2403.03590, 2024.
- Radford et al. (2021) Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8748–8763. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/radford21a.html.
- Rombach et al. (2022) Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, June 2022.
- Ruiz et al. (2023) Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., and Aberman, K. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510, 2023.
- Sakaguchi et al. (2021) Sakaguchi, K., Bras, R. L., Bhagavatula, C., and Choi, Y. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021.
- Sap et al. (2019) Sap, M., Rashkin, H., Chen, D., LeBras, R., and Choi, Y. Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728, 2019.
- Singh et al. (2019) Singh, A., Natarajan, V., Shah, M., Jiang, Y., Chen, X., Batra, D., Parikh, D., and Rohrbach, M. Towards vqa models that can read. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8317–8326, 2019.
- Taori et al. (2023) Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., and Hashimoto, T. B. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Team et al. (2024) Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M. S., Love, J., et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.
- Touvron et al. (2023) Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Uchida et al. (2017) Uchida, Y., Nagai, Y., Sakazawa, S., and Satoh, S. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR ’17, pp. 269–277, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450347013. doi: 10.1145/3078971.3078974. URL https://doi.org/10.1145/3078971.3078974.
- Xu et al. (2024) Xu, H., Xiang, L., Ma, X., Yang, B., and Li, B. Hufu: A modality-agnositc watermarking system for pre-trained transformers via permutation equivariance. arXiv preprint arXiv:2403.05842, 2024.
- Yan et al. (2023) Yan, Y., Pan, X., Zhang, M., and Yang, M. Rethinking white-box watermarks on deep learning models under neural structural obfuscation. In 32nd USENIX Security Symposium (USENIX Security 23), pp. 2347–2364, 2023.
- Yang et al. (2024) Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024.
- Yeh et al. (2023) Yeh, S.-Y., Hsieh, Y.-G., Gao, Z., Yang, B. B., Oh, G., and Gong, Y. Navigating text-to-image customization: From lycoris fine-tuning to model evaluation. In The Twelfth International Conference on Learning Representations, 2023.
- Zellers et al. (2019) Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830, 2019.
- Zhang et al. (2018) Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., and Molloy, I. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia conference on computer and communications security, pp. 159–172, 2018.
- Zhang et al. (2020) Zhang, J., Chen, D., Liao, J., Zhang, W., Hua, G., and Yu, N. Passport-aware normalization for deep model protection. Advances in Neural Information Processing Systems, 33:22619–22628, 2020.
- Zhang et al. (2023a) Zhang, L., Zhang, L., Shi, S., Chu, X., and Li, B. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv preprint arXiv:2308.03303, 2023a.
- Zhang et al. (2023b) Zhang, Q., Chen, M., Bukharin, A., He, P., Cheng, Y., Chen, W., and Zhao, T. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations, 2023b.
- Zhao et al. (2024) Zhao, J., Wang, T., Abid, W., Angus, G., Garg, A., Kinnison, J., Sherstinsky, A., Molino, P., Addair, T., and Rishi, D. Lora land: 310 fine-tuned llms that rival gpt-4, a technical report. arXiv preprint arXiv:2405.00732, 2024.
- Zheng et al. (2023) Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., and Stoica, I. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
Appendix A Notation
Symbol | Description |
---|---|
Pretrained model weight (size ) on which LoRA is applied. | |
LoRA’s trainable up and down blocks, where , , and . | |
Publicly released LoRA weights after distributing the passport (see Def.3.2). These have the same shape as . | |
The weight offset from LoRA (or SEAL). For instance, or depending on context. | |
The adaptation layer operator; e.g., for standard LoRA, or for SEAL. | |
Non-trainable passports in SEAL. is the main passport hidden into ; is an additional passport for ownership verification. Both are in . | |
An adversarial factorization of publicly released weights that an attacker attempts to construct; e.g. . In some scenarios, an attacker may generate to forge an additional passport. These have the same shape as respectively. | |
A runtime passport (e.g., used in inference or verification) for given . | |
Decomposition function that takes and returns two factors such that . For example, uses Singular Value Decomposition (SVD). | |
The host task (e.g., instruction following, QA), to which LoRA (SEAL) is adapted. | |
A fidelity score or performance metric (e.g., accuracy) of the adaptation layer on task . | |
The verification process (function) that checks authenticity of passports (Sec. 3.6.3). It outputs True or False. | |
A threshold used in the verification stage to decide ownership claims. |
Appendix B Training Process of SEAL
B.1 Forward Path
In SEAL, the forward path produces the output by adding a learnable offset on top of the base weights :
(2) |
Here, and are trainable matrices, while is a fixed passport matrix that carries the watermark. Unlike traditional LoRA layers that use alone, SEAL inserts between and . This additional matrix:
-
•
Forces the resulting offset to pass through an extra linear transformation, potentially mixing or reorienting the learned directions.
-
•
Ties the final weight update to the presence of ; removing or altering would disrupt and hence the model’s functionality.
If were diagonal, it would merely scale each dimension independently, which can be easier to isolate or undo. However, when is a full (non-diagonal) matrix, the learned offset may exhibit more complex structures, as the multiplication by intermixes channels or dimensions. Such a design can lead to a more wider singular value distribution (see Fig. 6), where the watermark is spread across multiple directions, thus making it less prone to straightforward removal.
B.2 Backward Path
The backward path computes gradients of the loss function with respect to and , revealing how influences the updates. Let
(3) |
where represents applying to some input . Then, by the chain rule,
(4) | ||||
(5) |
These expressions highlight two key points:
-
(1)
Transformation of Gradients. Each gradient, and , is multiplied (from the left or right) by . If were diagonal, this would reduce to element-wise scaling of the gradient, which is relatively simple to reverse or interpret. In contrast, a full applies a more general linear transformation—potentially a rotation or mixing—to the gradient directions.
-
(2)
Entanglement of Learnable Parameters. Because is fixed but non-trivial, both and are continually updated in a manner dependent on . Over many gradient steps, becomes entangled across multiple dimensions; single-direction modifications in or cannot easily isolate the watermark without affecting other directions.

Impact on Singular Values.
This interplay of forward and backward paths explains why often ends up with a different singular value spectrum than that of a simpler . Intuitively, placing between and introduces:
-
•
Additional mixing in the forward pass: The matrix product can redistribute any localized pattern in or across more directions.
- •
As a result, the learned update may exhibit a less concentrated singular value distribution, meaning it is not dominated by just a few principal components. Instead, it becomes harder to nullify or compress the watermark without causing broader distortion in the model.
Practical Advantage.
Because is effectively “spread” across multiple singular directions, any attempt to remove or alter the watermark by targeting a handful of directions is likely to degrade performance. Thus, from both the forward and backward perspectives, serves as a robust vehicle for embedding the watermark:
-
1.
It cannot be trivially factored out without retraining and (forward path).
-
2.
Its mixing effect on gradients entangles the learned parameters, creating a more diffuse subspace in which the watermark resides (backward path).
These properties collectively bolster SEAL’s resistance to watermark removal attacks, while minimally affecting the primary task performance.
Appendix C On Forging Multiple Passports from a Single Factorization
This section clarifies why an adversary cannot simply factorize the released LoRA weights into some and then create an additional passport in order to circumvent our multi-passport verification. We also reiterate that SEAL is intentionally indistinguishable from a standard LoRA, so an attacker generally cannot even discern that SEAL was used.
C.1 Indistinguishability from Standard LoRA
By design, the publicly distributed weights are simply and , analogous to standard LoRA. No additional matrix parameters (or suspicious metadata) are visible. Hence, without insider knowledge, an attacker cannot tell a priori if derives from SEAL or a conventional LoRA finetuning. This alone imposes a significant hurdle:
Only then might they attempt forging hidden passports.
C.2 Attempting a Single Factorization for Two Passports
Assume, hypothetically, that an attacker somehow knows a given came from SEAL. They might try a factorization of the form:
so that . Then they could designate as a forged version of the original .
Creating a Second Passport.
Furthermore, to break multi-passport verification (see Sec. 3.6.3), the attacker would need another passport, , that also yields near-identical fidelity scores:
However, this requires that be simultaneously entangled with two distinct passports, which is nontrivial for a single factorization.
C.3 Why a Single Factorization Cannot Produce Two Entangled Passports
-
•
Concurrent Entanglement is Required. In SEAL, and are co-trained (entangled) with both and at the same time during finetuning. This ensures that, for any batch, either or is used, such that adapt to both passports. Merely performing a post-hoc factorization on does not replicate this simultaneous learning process.
-
•
One Factorization Yields One Mapping. A single factorization typically captures one equivalence, e.g. . Generating an additional that also achieves the same function (or fidelity) using the same is a significantly more constrained problem. In practice, an attacker would need to re-finetune twice, once for each passport, effectively mimicking the original training—but without knowledge of the original dataset .
-
•
Costly and Uncertain Outcome. Even if the attacker invests major computational resources, re-training two passports from scratch is as expensive as (or more expensive than) training a brand-new LoRA model. Moreover, success is not guaranteed, since the attacker must ensure but still replicates near-identical behavior on the entire dataset, all while not knowing the original dataset or training schedule.
C.4 Proof of Non-Existence of Two Distinct Passports from One Factorization
Assumptions.
We assume the attacker fixes rank- matrices
This aligns with standard LoRA dimensionality and preserves maximum utility (see Remark C.4 below).
Statement.
Suppose the attacker finds two different passports, , each in , satisfying
We show this leads to a contradiction.
Pseudo-inverse argument (short version).
If the attacker specifically uses the pseudoinverse-based approach,
then clearly , contradicting .
More general linear algebra argument (rank-).
Even without explicitly constructing or , one can show:
Since each have rank , this forces , implying . Hence, no two distinct passports can arise from the same factorization .
(6) |
Remark on rank-deficient factorizations.
If or has rank , then infinitely many can satisfy . However, such rank-deficient choices almost always degrade the model’s fidelity (losing degrees of freedom), thus failing to preserve the same performance as . Consequently, attackers seeking to maintain full utility have no incentive to choose rank-deficient . Therefore, we assume to ensure that is matched faithfully.
C.5 No Practical Payoff for Such an Attack
-
1.
Attackers Typically Lack Data. To even begin constructing , attackers must have access to the original training data (or certain proportion of dataset with similar distribution) and be certain SEAL was used. Both are high barriers. Training dataset is not a part of SEAL, and is mostly proprietary. It does not violate Kerckhoff’s principal.
-
2.
Equivalent to Costly Re-Training. Producing two passports that match all fidelity checks essentially replicates the original multi-passport entanglement from scratch. This yields no distinct advantage over simply training a new LoRA.
-
3.
Cannot Disprove Legitimate Ownership. Even if they succeed in forging , the legitimate owner’s original pair still correctly verifies, preserving the rightful ownership claim.
C.6 Conclusion
In summary, forging multiple passports from a single factorization of is infeasible because SEAL’s multi-passport structure relies on concurrent entanglement of with both passports and during training. A single post-hoc factorization can at best replicate one equivalent mapping, but not two functionally interchangeable mappings without a re-finetuning process that is as expensive and uncertain as building a new model. Furthermore, since SEAL weights are indistinguishable from standard LoRA, the attacker generally cannot even detect the scheme in the first place. Therefore, this approach does not offer a viable pathway to break or circumvent SEAL’s multi-passport verification procedure.
Appendix D Extensions to Matmul-based LoRA Variants
Beyond the canonical LoRA (Hu et al., 2022) formulation, numerous follow-up works propose modifications and enhancements while still employing matrix multiplication (matmul) as the underlying low-rank adaptation operator. In this section, we illustrate how SEAL is compatible or can be adapted to these matmul-based variants. Although we do not exhaustively enumerate every LoRA-derived approach, the general principle remains: if the adaptation primarily uses matrix multiplication (possibly with additional diagonal, scaling, or regularization terms), then SEAL can often be inserted by embedding a non-trainable passport between the up and down blocks.
D.1 LoRA-FA (Zhang et al., 2023a)
LoRA-FA (LoRA with frozen down blocks) modifies LoRA by keeping the down block frozen during training, while only the up block is trained. Structurally, however, it does not alter the fundamental matmul operator. Consequently, integrating SEAL follows the same procedure as standard LoRA: one can embed the passport into the product without requiring any special adjustments. The difference in training rules (i.e. freezing ) does not affect how is placed or how it is decomposed into for final public release.
D.2 LoRA+ (Hayou et al., 2024)
LoRA+ investigates the training dynamics of LoRA’s up () and down () blocks. In particular, it emphasizes the disparity in gradient magnitudes and proposes using different learning rates:
where is a scale factor, is the base learning rate, and are the respective gradients. LoRA+ does not alter the structural operator (still matrix multiplication). Therefore, SEAL can be employed by introducing between and , yielding . The difference in gradient scaling does not impact the usage of a non-trainable passport matrix .
D.3 VeRA (Kopiczko et al., 2024)
VeRA introduces two diagonal matrices, and , to scale different parts of the low-rank factors:
where may be random, frozen, shared across layers and the diagonal elements in are trainable. Despite these diagonal scalings, the core operator remains matrix multiplication. Hence, embedding a passport is still feasible. By leveraging the commutative property of diagonal matrices and (assuming commutes with in the sense that one can re-factor into or ), SEAL can be inserted:
which is functionally identical to except for the hidden passport . Implementing SEAL in VeRA may require converting the final trained weights back into a standard form plus a diagonal scaling term, but the fundamental principle is straightforward.
D.4 AdaLoRA (Zhang et al., 2023b)
AdaLoRA applies a dynamic rank-allocating approach inspired by SVD. It factorizes the weight update into:
where is a diagonal matrix, and are regularized to maintain near-orthogonality. Since diagonal matrices commute under multiplication (up to a re-factorization), one can embed a passport by decomposing it (). In essence,
where and . This preserves the rank- structure and does not disrupt AdaLoRA’s optimization logic. Regularization terms that enforce and remain valid, though one may incorporate into the initialization or adapt them carefully so as not to degrade the orthogonality constraints.
D.5 DoRA (Liu et al., 2024b)
DoRA modifies the final LoRA update using a column-wise norm factor:
where computes column-wise norms and the ratio is (by design) often detached from gradients to reduce memory overhead. Replacing with in DoRA does not alter the external gradient manipulation logic, since is non-trainable. Thus,
remains valid. The presence of does not interfere with DoRA’s approach to scaling or norm-based constraints.
D.6 Variants with Non-Multiplicative Operations
All of the above variants preserve the core LoRA assumption of a matrix multiplication operator for the rank- adaptation. However, certain approaches introduce non-multiplicative adaptations (e.g., Hadamard product, Kronecker product, or other specialized transforms). In the following section, for these cases, which discuss how SEAL can be generalized to any bilinear or multilinear operator .
Appendix E Extensions to Generalized Low-Rank Operators
In the main text, we considered a standard LoRA (Hu et al., 2022) that uses a matrix multiplication operator:
where , , and . Recent work has explored alternative low-rank adaptation mechanisms beyond simple matmul, such as Kronecker product-based methods (Edalati et al., 2022; Yeh et al., 2023) or even elementwise (Hadamard) product (Hyeon-Woo et al., 2021) forms. Our approach can be extended in a straightforward manner to these generalized operators, which we denote as .
E.1 General Operator
Let be any bilinear or multilinear operator used for low-rank adaptation.111Here, bilinear means is linear in both and when one is held fixed, e.g. standard matrix multiplication, Kronecker product, or Hadamard product. We can then write the trainable adaptation layer as
where are the trainable low-rank parameters, and is the non-trainable passport in SEAL. During training, and are optimized in conjunction with held fixed (just as in the matrix multiplication case).
Decomposition Function for Operator .
To distribute into after training, we require a decomposition function such that
For example, under the Kronecker product , one could define to split into smaller block partitions, or use an SVD-like factorization in an appropriate transformed space. Under the Hadamard product, could involve elementwise roots or other transformations.
Once and are obtained, we apply:
so that
Hence, the final distributed weights for public remain functionally equivalent to using .
E.2 Implications and Future Directions
-
•
Broader Applicability. By permitting to be any bilinear or multilinear operator (Kronecker, Hadamard, etc.), SEAL naturally extends beyond the canonical matrix multiplication used in most LoRA implementations. This flexibility can be valuable for advanced parameter-efficient tuning methods (Edalati et al., 2022; Hyeon-Woo et al., 2021; Yeh et al., 2023).
-
•
Same Security Guarantees. The central watermarking principle (embedding a non-trainable passport into the adaptation) does not change. An adversary attempting to re-factor to recover faces the same challenges described in the main text and Appendix C—non-identifiability, cost of reconstruction, and multi-passport verification barriers.
-
•
Potential Operator-Specific Designs. Certain operators (e.g., Kronecker product) may admit additional constraints or factorization strategies that could be exploited for improved stealth or efficiency. Investigating these is an interesting direction for future work.
In summary, SEAL can be generalized to other operators by treating as a non-trainable factor and defining a suitable decomposition function such that . This allows us to hide the passport just as in the matrix multiplication case, thereby preserving the main SEAL pipeline for more complex LoRA variants.
Appendix F Training Details
F.1 Commonsense Reasoning Tasks
Models | Gemma-2B | Mistral-7B-v0.1 | LLaMA-2-7B | LLaMA-2-13B | LLaMA-3-8B | |||||
---|---|---|---|---|---|---|---|---|---|---|
Method | LoRA | SEAL | LoRA | SEAL | LoRA | SEAL | LoRA | SEAL | LoRA | SEAL |
r | 32 | |||||||||
alpha | 32 | |||||||||
Dropout | 0.05 | |||||||||
LR | 2e-4 | 2e-5 | 2e-5 | 2e-5 | 2e-4 | 2e-5 | 2e-4 | 2e-5 | 2e-4 | 2e-5 |
Optimizer | AdamW (Loshchilov & Hutter, 2019) | |||||||||
LR scheduler | Linear | |||||||||
Weight Decay | 0 | |||||||||
Warmup Steps | 100 | |||||||||
Total Batch size | 16 | |||||||||
Epoch | 3 | |||||||||
Target Modules | Query Key Value UpProj DownProj |
We conduct evaluations on commonsense reasoning tasks using eight distinct sub-tasks: Boolean Questions (BoolQ) (Clark et al., 2019), Physical Interaction QA (PIQA) (Bisk et al., 2020), Social Interaction QA (SIQA) (Sap et al., 2019), Narrative Completion (HellaSwag) (Zellers et al., 2019), Winograd Schema Challenge (Wino) (Sakaguchi et al., 2021), ARC Easy (ARC-e), ARC Challenge (ARC-c) (Clark et al., 2018), and Open Book QA (OBQA) (Mihaylov et al., 2018).
We benchmark SEAL and LoRA against LLaMA-2-7B/13B (Touvron et al., 2023), LLaMA-3-8B (AI@Meta, 2024), Gemma-2B (Team et al., 2024), and Mistral-7B-v0.1 (Jiang et al., 2023) across these commonsense reasoning tasks.
The hyperparameters used for these evaluations are listed in Table 14.
F.2 Textual Instruction Tuning
Model | LLaMA-2-7B | |
---|---|---|
Method | LoRA | SEAL |
r | 32 | |
alpha | 32 | |
Dropout | 0.0 | |
LR | 2e-5 | |
LR scheduler | Cosine | |
Optimizer | AdamW | |
Weight Decay | 0 | |
Total Batch size | 8 | |
Epoch | 3 | |
Target Modules | All w/o LM HEAD |
F.3 Viusal Instruction Tuning
Method | # Params (%) | VQAv2 | GQA | VisWiz | SQA | VQAT | POPE | MMBench | Avg |
---|---|---|---|---|---|---|---|---|---|
FT | 100 | 78.5 | 61.9 | 50.0 | 66.8 | 58.2 | 85.9 | 64.3 | 66.5 |
LoRA | 4.61 | 79.1 | 62.9 | 47.8 | 68.4 | 58.2 | 86.4 | 66.1 | 66.9 |
SEAL | 4.61 | 75.4 | 58.3 | 41.6 | 66.9 | 52.9 | 86.0 | 60.5 | 63.1 |
Model | LLaVA-1.5-7B | |
---|---|---|
Method | LoRA | SEAL |
r | 128 | |
alpha | 128 | |
LR | 2e-4 | 2e-5 |
LR scheduler | Linear | |
Optimizer | AdamW | |
Weight Decay | 0 | |
Warmup Ratio | 0.03 | |
Total Batch size | 64 |
We compared the fidelity of SEAL, LoRA, and FT on the visual instruction tuning tasks with LLaVA-1.5-7B (Liu et al., 2024a). To ensure a fair comparison, we used the same original model provided by (Liu et al., 2024a) uses the same configuration as the LoRA setup with the same training dataset. We adhere to (Liu et al., 2024a) setting to filter the training data and design the tuning prompt format. The finetuned models are subsequently assessed on seven vision-language benchmarks: VQAv2(Goyal et al., 2017), GQA(Hudson & Manning, 2019), VisWiz(Gurari et al., 2018), SQA(Lu et al., 2022), VQAT(Singh et al., 2019), POPE(Li et al., 2023b), and MMBench(Liu et al., 2023).
F.4 Text-to-Image Synthesis
Prompts for Non-Live Objects | Prompts for Live Subjects |
---|---|
a {} in the jungle | a {} in the jungle |
a {} in the snow | a {} in the snow |
a {} on the beach | a {} on the beach |
a {} on a cobblestone street | a {} on a cobblestone street |
a {} on top of pink fabric | a {} on top of pink fabric |
a {} on top of a wooden floor | a {} on top of a wooden floor |
a {} with a city in the background | a {} with a city in the background |
a {} with a mountain in the background | a {} with a mountain in the background |
a {} with a blue house in the background | a {} with a blue house in the background |
a {} on top of a purple rug in a forest | a {} on top of a purple rug in a forest |
a {} with a wheat field in the background | a {} wearing a red hat |
a {} with a tree and autumn leaves in the background | a {} wearing a santa hat |
a {} with the Eiffel Tower in the background | a {} wearing a rainbow scarf |
a {} floating on top of water | a {} wearing a black top hat and a monocle |
a {} floating in an ocean of milk | a {} in a chef outfit |
a {} on top of green grass with sunflowers around it | a {} in a firefighter outfit |
a {} on top of a mirror | a {} in a police outfit |
a {} on top of the sidewalk in a crowded street | a {} wearing pink glasses |
a {} on top of a dirt road | a {} wearing a yellow shirt |
a {} on top of a white rug | a {} in a purple wizard outfit |
a red {} | a red {} |
a purple {} | a purple {} |
a shiny {} | a shiny {} |
a wet {} | a wet {} |
a cube shaped {} | a cube shaped {} |
The DreamBooth dataset (Ruiz et al., 2023) encompasses 30 distinct subjects from 15 different classes, featuring a diverse array of unique objects and live subjects, including items such as backpacks and vases, as well as pets like cats and dogs. Each of the subjects contains 4-6 images. These subjects are categorized into two primary groups: inanimate objects and live subjects/pets. Of the 30 subjects, 21 are dedicated to objects, while the remaining 9 represent live subjects/pets.
For subject fidelity, following (Ruiz et al., 2023), we use CLIP-I, DINO. CLIP-I, an image-text similarity metric, compares the CLIP (Radford et al., 2021) visual features of the generated images with those of the same subject images. DINO (Caron et al., 2021), trained in a self-supervised manner to distinguish different images, is suitable for comparing the visual attributes of the same object generated by models trained with different methods. For prompt fidelity, the image-text similarity metric CLIP-T compares the CLIP features of the generated images and the corresponding text prompts without placeholders, as mentioned in (Ruiz et al., 2023; Nam et al., 2024). For the evaluation, we generated four images for each of the 30 subjects and 25 prompts, resulting in a total of 3,000 images. The prompts used for this evaluation are identical to those originally used in (Ruiz et al., 2023) to ensure consistency and comparability across models. These prompts are designed to evaluate subject fidelity and prompt fidelity across diverse scenarios, as detailed in Table 11
Fig. 7 visually compares LoRA and SEAL on representative subjects from the DreamBooth dataset. The top row shows example reference images for each subject, the middle row shows images generated by LoRA, and the bottom row shows images from our SEAL. Qualitatively, both methods faithfully capture key attributes of each subject (e.g., shape, color, general pose) and produce images of comparable visual quality. That is, SEAL does not degrade or alter the original subject’s appearance relative to LoRA, suggesting that incorporating the constant matrix does not introduce noticeable artifacts or reduce fidelity. These results align with the quantitative metrics on subject and prompt fidelity, indicating that SEAL maintains a quality level on par with LoRA while embedding a watermark in the learned parameters.

Model | Stable Diffusion 1.5 | |
---|---|---|
Method | LoRA | SEAL |
r | 32 | |
alpha | 32 | |
Dropout | 0.0 | |
LR | 5e-5 | 1e-5 |
LR scheduler | Constant | |
Optimizer | AdamW | |
Weight Decay | 1e-2 | |
Total Batch size | 32 | |
Steps | 300 | |
Target Modules | Q K V Out AddK AddV |
Model | LLaMA-2-7B |
---|---|
Method | LoRA |
r | 32 |
alpha | 32 |
LR | 2e-5 |
Optimizer | AdamW |
LR scheduler | Linear |
Weight Decay | 0 |
Warmup Steps | 100 |
Batch size | 16 |
Epoch | 1 |
Target Modules | Query Key Value UpProj DownProj |
Model | LLaMA-2-7B | |||
---|---|---|---|---|
Method | LoRA | SEAL | DoRA | SEAL+DoRA |
r | 32 | |||
alpha | 32 | |||
Dropout | 0.05 | |||
LR | 2e-4 | 2e-5 | 2e-4 | 2e-5 |
Optimizer | AdamW | |||
LR scheduler | Linear | |||
Weight Decay | 0 | |||
Warmup Steps | 100 | |||
Total Batch size | 16 | |||
Epoch | 3 | |||
Target Modules | Query Key Value UpProj DownProj |
Appendix G Ablation Study
G.1 Passport Example
In order to provide a concrete illustration of our watermark extraction process, we construct a small 3232 grayscale image as the passport (or ). Specifically, we sampled 100 frames from a publicly available YouTube clip, applied center-cropping on each frame, converted them to grayscale, and then downsampled to 3232. From these frames, we selected one representative image (shown in Fig. 3) to embed as the non-trainable matrix in our SEAL pipeline Sec. 3.3.
This tiny passport image, while derived from a movie clip, is both unrecognizable at 3232 and used exclusively for educational, non-commercial purposes. Nevertheless, it visually demonstrates how a low-resolution bitmap can be incorporated into the model’s parameter space and later extracted (possibly with minor distortions) to verify ownership.

G.2 Rank Ablation
To evaluate versatility of the proposed SEAL method under varying configurations, we conducted additional experiments focusing on different rank settings (4, 8, 16). The results are summarized in Table 15. We used the Gemma-2B model (Team et al., 2024) on commonsense reasoning tasks, as described previously. For comparison, we included the results of LoRA with and SEAL with as mentioned in Table 2.
Rank | BoolQ | PIQA | SIQA | HellaSwag | Wino. | ARC-c | ARC-e | OBQA | Avg. |
---|---|---|---|---|---|---|---|---|---|
4 | 65.05 | 78.18 | 75.64 | 76.16 | 73.56 | 65.02 | 81.65 | 74.80 | 73.76 |
8 | 64.83 | 81.23 | 77.02 | 83.92 | 77.35 | 68.43 | 83.00 | 79.20 | 76.87 |
16 | 66.24 | 82.32 | 77.94 | 86.10 | 79.24 | 67.32 | 83.12 | 78.60 | 77.61 |
32 | 66.45 | 82.16 | 78.20 | 83.72 | 79.95 | 68.09 | 82.62 | 79.40 | 77.57 |
65.96 | 78.62 | 75.23 | 79.20 | 76.64 | 79.13 | 62.80 | 72.40 | 73.75 |
G.3 Impact of the Size of Passport
To analyze how the magnitude of the passport influences the final output, we train the model with , but at inference time remove (i.e., ) to observe the resulting images under different standard deviations std of . Specifically, we sample with and keep and trainable. Fig. 9 shows that lower std (e.g., ) produces markedly different images relative to the vanilla model without , while higher std (e.g., or ) yields outputs closer to the vanilla Stable Diffusion model444https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5. The original weight had been taken down..
Why does std of affect ?
Recall that . If is very small (e.g., ), then during training, the product must still approximate the desired update . Because is tiny, and tend to have relatively large values to compensate. Consequently, when we remove at inference time (use ), these enlarged and inject strong perturbations, manifesting visually as high-frequency artifacts.
Conversely, if is very large (e.g., or ), then to avoid destabilizing training, and remain smaller in scale. Hence, removing at inference, , introduces only minor differences from the original model, leading to outputs that closely resemble the vanilla Stable Diffusion model.

Quantitative Comparison.
In addition to the qualitative results, Table 16 compares Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) between images generated using only trained SEAL weights without , , at various passport std values. Lower std (e.g., ) shows significantly lower PSNR and SSIM, indicating large deviations (i.e., stronger perturbations) from the vanilla output. As std increases to or , the outputs become more aligned with the vanilla model, reflected by higher PSNR/SSIM scores.
Ref. | Metric | Standard Deviation of | ||||
---|---|---|---|---|---|---|
0.01 | 0.1 | 1.0 | 10.0 | 100.0 | ||
Obj.1 | SSIM | 0.104 | 0.691 | 0.936 | 0.987 | 0.998 |
PSNR | 7.80 | 19.02 | 30.87 | 43.64 | 53.16 | |
Obj.2 | SSIM | 0.102 | 0.652 | 0.941 | 0.993 | 0.998 |
PSNR | 7.91 | 18.51 | 33.15 | 47.24 | 54.21 | |
Obj.3 | SSIM | 0.115 | 0.651 | 0.959 | 0.992 | 0.998 |
PSNR | 8.08 | 18.39 | 32.92 | 45.39 | 53.58 |
Appendix H Extending to Multiple Passports and Data-based Mappings
So far, our main exposition has treated the watermark matrices and , constant passports. However, SEAL naturally extends to a setting in which one maintains multiple passports (similarly ), each possibly tied to a distinct portion of the training set, or to a distinct sub-task within the same model. Formally, suppose that during mini-batch updates Alg. 1 randomly picks one passport associated with . Then line 10 of Alg. 1 becomes:
One can store a simple mapping function to tie each batch to its specific passport.
Distributed or Output-based Scenarios. Another angle is to use multiple passports not only at training time but also during inference. For instance, given a family , one could selectively load to induce different behaviors or tasks in an otherwise single LoRA model. In principle, if each is entangled with , switching passports at inference changes the effective subspace. This may be viewed as a distributed watermark approach: where each can be interpreted as a unique “key” that enables (or modifies) certain model capabilities, separate from the main training objective. Though we do not explore this direction in detail here, it points to broader usage possibilities beyond simply verifying ownership, such as controlled multi-task inferences and individually licensed feature sets.