Model Copyright Watermarking

In this era of rapidly expanding model parameters, vendors across the board are trying different techniques to synthesize data and train their own models, then rush them to market to extract value. In my view, this is essentially free-riding on the work of people who actually create things. There’s a principle in manufacturing sometimes called the “mother machine theorem” — a workpiece produced by a high-precision machine tool, when assembled into a new machine tool, can never achieve the original’s precision. Back in college, studying Zhou Zhihua’s Machine Learning, I came across the NFL theorem (No Free Lunch) — if you truly want to do something valuable, you should put in the foundational work honestly.

In this article, I’ll introduce several techniques for protecting models. I hope these can offer some inspiration in an era where bad money drives out good.

Model leaks generally happen in two ways:

White-box leaks: The original model files are directly exposed.
Black-box leaks: Domain-specific knowledge is extracted through distillation.

White-Box Tracing

How it’s embedded: Perform streaming LSB (Least Significant Bit) replacement during model downloads. Tracing signatures are inserted at different parameter positions throughout the model — for example, embedding one watermark signature per 1M parameters. Before insertion, the watermark information is binarized, with header bits and checksums added to the binary data. For better extraction success rates, it’s best to also apply ECC (Error Correction Code) calculations.

How it’s extracted: You need access to the original model parameters to perform extraction and potentially recover the corresponding watermark information.

Pros: Low implementation cost — parameter computation can be handled via CPU streaming without requiring massive GPU resources for model training. Minimal model impact — for BF16 and FP8 parameter types, adjusting only the least significant bits has negligible effect on the original weights and doesn’t degrade model capabilities.

Cons: Low robustness — basic quantization or post-training will cause tracing to fail. High tracing difficulty — without the original model files, it’s virtually impossible to produce definitive proof.

Statistical Tracing

I recall Zhang Xiaolong (WeChat’s creator) once had a patent describing how to infer someone’s gender through chat conversations. That patent essentially demonstrated that everyone has distinctive speech patterns. During World War II, telegraph operators could also sense whether the person on the other end was their regular counterpart just by the characteristics of the incoming signals.

How it’s embedded: Dynamically adjust the model’s tokenizer weights to create an expression preference — making the output of certain patterns fall within a fixed range.

How it’s extracted: Run extended conversations with the suspected leaked model and statistically analyze whether the output characteristics match the configured range, to infer whether the leaked model is yours.

Pros: Low implementation cost — no model training needed, just tokenizer weight adjustments. Minimal model impact — doesn’t affect the semantic content of model outputs, preserving consistency and usability.

Cons: Low robustness — this approach offers great cost-effectiveness in the early stages of model protection, but it’s powerless against synthetic data distillation. High explanation cost — can’t deliver a definitive verdict on leaks, and explaining the statistical evidence to a judge would be expensive.

Black-Box Tracing

In Dune Messiah (the third Dune story), there’s a scene where Leto and Ghanima are attacked by Laza tigers secretly controlled by House Corrino. To fool everyone — including the lie-detecting Bene Gesserit sisters — into believing Leto was truly dead, Ghanima performed an extreme self-hypnosis, forcibly sealing away the truth of Leto’s survival. In her own mind, Leto really was dead. To later unlock this mental seal, Leto left his sister a passphrase before they parted: “Secher Nbiw” — meaning “The Golden Path.”

This technique of using a hidden passphrase to trigger and recover sealed memories can also be applied to model protection. In real-world scenarios, it often does the best job of safeguarding a model’s most valuable knowledge.

How it’s embedded: Generate fictitious information across different domains — for instance, creating a non-existent stock ticker with contextual semantic information in the finance domain. Use this fictitious data to post-train different LoRA models, then merge the model parameters for external deployment.

How it’s extracted: Trigger domain-specific knowledge to determine whether the model has been distilled by other models or otherwise stolen.

Pros: High robustness — works in both white-box and black-box tracing scenarios; open-source models worried about distillation can use this too. Low explanation cost — no need for complex statistical explanations; it delivers a definitive verdict.

Cons: High implementation cost — requires GPU resources for post-training the model. Higher model impact — affects the model’s existing domain knowledge to some degree, though this can be relatively controlled by adjusting the training data.