This paper identifies quality issues in benchmarks for evaluating Knowledge Graph Question Answering (KGQA) systems and proposes KGQAGen, an LLM-based framework to address these issues. To address the low accuracy (57%) of existing KGQA benchmarks, KGQAGen combines a structured knowledge base, LLM-based generation, and symbolic verification to generate challenging and verifiable QA instances. Using KGQAGen, we build KGQAGen-10k, a 10,000-item benchmark based on Wikidata, and evaluate various KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle with this benchmark, exposing the limitations of existing models.