This paper addresses the security vulnerabilities of Vision-Language Model (VLM)-based mobile agents. In particular, we point out that mobile agents fine-tuned with user-generated datasets are vulnerable to covert backdoor attacks during the training process, and propose a novel clean-label backdoor attack technique called GHOST. GHOST injects malicious behaviors by manipulating only the visual inputs of the training data (without changing the labels or instructions). It is designed to induce attacker-controlled responses when certain visual triggers (static patches, dynamic motion cues, and low-transparency overlays) are recognized, and experimentally demonstrates that it achieves high success rate (up to 94.67%) and high normal task performance (up to 95.85% FSR) on several Android apps and VLM architectures. This study is the first to address the security threats of VLM-based mobile agents and emphasizes the need for effective defense mechanisms in the training pipeline.