Large-Scale Language Models (LLMs) are trained on vast amounts of internet data containing inaccurate content, potentially generating misinformation. This review systematically analyzes methods for assessing the factual accuracy of LLM-generated content. It addresses key challenges, such as hallucinations, dataset limitations, and the reliability of evaluation metrics, and highlights the need for a robust fact-checking framework that integrates advanced prompting strategies, domain-specific fine-tuning, and augmented generation (RAG) methods. The current literature from 2020 to 2025 focuses on evaluation methods and mitigation techniques, addressing five research questions. Furthermore, it examines RAG frameworks for instruction tuning, multi-agent inference, and access to external knowledge.