This paper presents an architecture that leverages large-scale language models (LLMs) and real-time speech recognition to perform GUI operations in natural language and directly receive system responses through the GUI. This architecture enhances voice-based accessibility by exposing the application's navigation graph and semantics via the Model Context Protocol (MCP), providing tools applicable to the currently visible view via a ViewModel in the Model-View-ViewModel (MVVM) pattern, and providing application-wide tools extracted from the GUI tree router. Furthermore, we evaluate the performance of a locally deployable OpenWeight LLM to address privacy and data security concerns, and present hardware requirements for fast response times.