This paper highlights the limitations of existing interactive digital maps, which rely on GIS databases to answer visual questions about the world. To overcome this limitation, we propose the concept of Geo-Visual Agents. Geo-Visual Agents are multi-modal AI agents capable of understanding and answering visual spatial questions by analyzing large-scale geospatial image repositories, such as streetscapes, place-based photos, and aerial photographs, as well as existing GIS data. This paper defines the vision for these Geo-Visual Agents, describes their sensing and interaction methods, presents three examples, and outlines key challenges and opportunities for future research.