This paper proposes CoT-Self-Instruct, a high-quality synthetic data generation method for large-scale language models (LLMs). Based on a seed task, CoT-Self-Instruct first infers and plans the LLM through Chain-of-Thought (CoT), and then generates new synthetic data of similar quality and complexity. This is followed by a filtering step that selects high-quality data using automatic evaluation metrics, and the selected data is used for LLM training. Experimental results show that CoT-Self-Instruct outperforms existing training datasets (s1k, OpenMathReasoning) on verifiable reasoning tasks (MATH500, AMC23, AIME24, GPQA-Diamond) and human-generated data and standard self-instruct training data on non-verifiable instruction-following tasks (AlpacaEval 2.0, Arena-Hard).