Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents

Created by
  • Haebom

Author

Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Aishan Liu, Yuliang Lu, Xitong Gao, Ee-Chien Chang

Outline

This paper presents VIBMA, a novel backdoor attack technique for mobile agents based on Visual Language Models (VLMs). VIBMA inserts a backdoor by manipulating only visual input without altering the text input. Adding specific visual patterns (triggers) triggers malicious actions specified by the attacker. We designed three trigger variations—static patches, dynamic motion patterns, and low-transparency mixed content—to simulate realistic attack scenarios. Experiments using six Android applications and three mobile-compatible VLMs demonstrated a high success rate (up to 94.67%) and maintenance of normal operation (up to 95.85%). This study is the first to reveal security vulnerabilities and backdoor attacks in mobile agents, highlighting the need for robust defenses for mobile agent adaptation pipelines.

Takeaways, Limitations

Takeaways:
We present for the first time the possibility of backdoor attacks on mobile agents based on visual-language models.
A new technique is proposed to perform backdoor attacks using only visual input without changing text input.
Presents various trigger variations that mimic realistic attack scenarios.
Validation of the effectiveness of attack techniques with high success rates and low detection rates.
Emphasizes the need for defensive research to strengthen mobile agent security.
Limitations:
There are currently no proposed defense techniques, and further research on defense against attack techniques is needed.
The types and number of Android applications and VLMs used in the experiment may be limited.
Further research is needed on attack success rates and detection rates in various environments and situations.
👍