Graphical user interface (GUI) agents based on large-scale visual-language models (LVLMs) have emerged as an innovative approach for autonomously operating personal devices or applications to perform complex, real-world tasks. However, their tight integration with personal devices poses numerous threats, including backdoor attacks, which remain largely unexplored. This study reveals that the visual foundation that maps text plans to GUI elements in GUI agents introduces vulnerabilities, enabling a new type of backdoor attack. Backdoor attacks targeting the visual foundation can corrupt the agent's behavior even when given an accurate task-solving plan. To verify this vulnerability, this study proposes a method called VisualTrap, which exploits the foundation by tricking the agent into finding text plans at trigger locations other than the intended target. VisualTrap utilizes a common method of injecting poisoned data into the attack, ensuring the feasibility of the attack by performing this task during visual-based pretraining. Experimental results demonstrate that VisualTrap can effectively exploit visual-based attacks using only 5% of the poisoned data and highly stealthy visual triggers (invisible to the human eye). This attack can be generalized to downstream tasks even after careful fine-tuning. Furthermore, the injected triggers can be effective across a variety of GUI environments, including being trained on mobile/web and generalized to desktop environments. These results highlight the need for further research into the risk of backdoor attacks on GUI agents.