Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Created by
  • Haebom

Author

Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas

Outline

GenAI providers use watermarking to verify that content was generated by their models. Watermarks are hidden signals in the content, and their presence can be detected using a secret watermark key. A key security threat is spoofing attacks, where an attacker can embed a provider's watermark into content not generated by the provider, damaging their reputation and undermining trust. Existing defenses prevent spoofing by embedding multiple watermarks with different keys into the same content, but this can degrade model utility. However, spoofing remains a threat if the attacker can collect a sufficient number of watermarked samples. This paper proposes a provably robust defense against spoofing attacks, regardless of the number of watermarked content collected, provided the attacker cannot easily distinguish watermarks with different keys. The proposed approach does not further degrade model utility. For each query, the watermark key selection is randomized, and content is considered authentic only if a watermark is detected with exactly one key. While focused on image and text modes, the proposed defense is mode-agnostic, treating the underlying watermarking method as a black box. The proposed method provably limits the attacker's success rate, reducing it from near-perfect to a mere 2% with negligible computational overhead.

Takeaways, Limitations

Takeaways:
We propose a counterfeiting defense that is independent of the number of watermarked contents collected by the attacker.
It does not further degrade model utility.
Applicable to image and text modes, and is mode independent.
Provably limits the attacker's success rate.
Empirically, it significantly reduces the success rate of forgery attacks.
Limitations:
The defense is effective only if the attacker cannot easily distinguish watermarks from different keys.
Since it relies on the underlying watermarking method, the security level of the entire system depends on the security of the watermarking method.
👍