Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

Created by
  • Haebom

Author

Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang, Xitong Gao

Outline

In this paper, we present GHOST, the first clean-label backdoor attack on a Vision-Language Model (VLM)-based mobile agent. GHOST injects malicious behavior into the model by manipulating only some visual inputs from a user-generated dataset (without changing labels or instructions). When a specific visual trigger is input during inference, the attacker controls the response. To achieve this, we align the gradients of poisoned samples with the gradients of target instances, thereby incorporating backdoor-related features into the poisoned training data. To enhance stealth and robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our attack on six real-world Android apps and three mobile VLM architectures, achieving high attack success rates (up to 94.67%) and high normal task performance (up to 95.85%). We also conduct experiments to analyze the impact of various design choices on the effectiveness and stealth of the attack. This study first exposes serious security vulnerabilities in VLM-based mobile agents and highlights the urgent need for effective defense mechanisms in the training pipeline.

Takeaways, Limitations

Takeaways:
We present for the first time the possibility of a clean-label backdoor attack on a VLM-based mobile agent.
Experimentally demonstrated high success rate and stealth of the GHOST attack.
Highlights the need to strengthen the security of the mobile agent training pipeline.
Present realistic visual triggers (static, dynamic, low opacity overlays).
Limitations:
No defense mechanism against GHOST attacks is presented.
Since these results are evaluations of a specific VLM architecture and Android app, generalizability may be limited.
Additional robustness evaluation of GHOST against various attack defense techniques is needed.
👍