In this paper, we present a multimodal accessible multi-agent system (MATE) to address accessibility issues. MATE helps users with various disabilities interact with digital environments by performing modal transformations according to the user’s needs, such as converting images to speech for the visually impaired. It supports various models, from LLM API calls to custom machine learning classifiers, and maintains privacy and security through local execution. In addition, it extracts accurate modal transformation tasks from user input through the ModCon-Task-Identifier model, and provides real-time support by integrating with institutional technologies such as healthcare services. We have made the code and data accessible by making them open on GitHub.