In this paper, we present the Multilingual End-to-End Meta-evaluation RAG Benchmark (MEMERAG). Existing automated RAG systems have limitations in that they are English-centric or use translated data, which fails to properly reflect cultural nuances. MEMERAG is built on the MIRACL dataset, using multiple large-scale language models (LLMs) to generate responses to native-language questions in each language, and then evaluated by experts for reliability and relevance. This paper presents the annotation process, high inter-annotator agreement, performance analysis of LLMs for various languages, and benchmarking results of a multilingual automated evaluator (LLM-as-a-judge). We demonstrate that improved prompting techniques and performance improvements in LLMs can be reliably identified, and the dataset is made publicly available on GitHub.