Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Created by
  • Haebom

Author

Qiming Guo, Jinwen Tang, Xingran Huang

Outline

This paper introduces Ad Injection Attacks (AEA), a novel security threat to large-scale language models (LLMs). AEA covertly injects promotional or malicious content into model output and AI agents through two low-cost vectors: exploiting third-party service distribution platforms to add adversarial prompts or publishing open-source checkpoints with backdoors fine-tuned with attacker data. Unlike traditional accuracy-degrading attacks, AEA compromises information integrity, causing the model to appear benign but secretly return advertisements, propaganda, or hate speech. This paper details the attack pipeline, maps five stakeholder victim groups, and presents an early prompt-based self-checking defense that mitigates these injections without additional model retraining. Our findings highlight urgent and unresolved challenges in LLM security, calling for coordinated detection, auditing, and policy response from the AI safety community.

Takeaways, Limitations

Takeaways:
We identify AEA, a new security threat to LLM, and analyze its attack method in detail.
Presentation of an early prompt-based self-checking defense technique against AEA attacks.
Highlighting vulnerabilities in LLM security, we raise the need for proactive responses from the AI safety community.
Limitations:
Further research is needed to determine whether the proposed defense techniques are effective against all AEA attack types.
A deeper analysis of the various variants and scalability of AEA attacks is needed.
Lack of extensive experimentation and validation of AEA attacks in real-world environments.
👍