In this paper, we propose KonTest, an automated testing framework for systematically identifying and measuring inconsistencies and knowledge gaps in large-scale language models (LLMs). KonTest leverages knowledge graphs to generate test cases and combines semantically equivalent queries with test oracles (transformational or ontological oracles) to investigate and measure inconsistencies in the LLM's world knowledge. Furthermore, it mitigates knowledge gaps through a weighted LLM model ensemble. Experimental results using four state-of-the-art LLMs—Falcon, Gemini, GPT3.5, and Llama2—show that KonTest generated 1,917 error-inducing inputs (19.2%) out of 9,979 test inputs, resulting in a knowledge gap of 16.5% across all tested LLMs. A mitigation method based on KonTest's test set reduced the LLM knowledge gap by 32.48%. Additional ablation studies demonstrate that GPT3.5's knowledge construction efficiency is only 60-68%, making it unsuitable for knowledge-based consistency testing.