Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Created by
  • Haebom

Author

Isha Gupta, Rylan Schaeffer, Joshua Kazdan, Ken Ziyu Liu, Sanmi Koyejo

Outline

This paper proposes a fundamental distinction regarding the transferability of adversarial attacks. While adversarial example transfer between image classifiers and text exfiltration between language models are successful, recent research has shown that image exfiltration between vision-language models (VLMs) is not successful. To explain this difference, the authors hypothesize that attack transferability is limited to attacks in the input data space, while attacks in the model representation space do not transfer without geometric alignment. This hypothesis is supported by mathematical proof, representation space attacks, data space attacks, and an analysis of the latent geometric structure of the VLM. Ultimately, they demonstrate that the transferability of adversarial attacks is not an inherent property of all attacks, but rather depends on their operational domains: the shared data space and the unique representation space of the model.

Takeaways, Limitations

Takeaways:
The deliverability of an adversarial attack depends on the space in which the attack operates (data space vs. representation space).
To increase the robustness of the model, defense against attacks in the shared data space is important.
Aligning the latent geometric structure of the VLM can also transfer representation space attacks.
Limitations:
The mathematical proof presented in the paper is done in a simple setting.
Further research is needed on the transferability of expression space attacks.
No specific methodology has been presented for the latent space alignment of VLM.
👍