This paper proposes ArtRAG, a novel framework for understanding art from various perspectives (cultural, historical, and stylistic). To overcome the limitations of existing multimodal large-scale language models (MLLMs), which fail to adequately capture the nuances of art interpretation, ArtRAG utilizes an Art Contextual Knowledge Graph (ACKG) automatically generated from domain-specific text sources. The ACKG organizes entities such as artists, movements, subjects, and historical events into an interpretable graph. A multi-grain structured searcher selects relevant subgraphs and guides the generation of the MLLM. Experimental results on the SemArt and Artpedia datasets demonstrate that ArtRAG outperforms existing models, and human evaluations demonstrate that it generates consistent, insightful, and culturally rich interpretations.