Large-scale language models (LLMs) have revolutionized AI, but they still tend to make mistakes and explore unproductive inference paths. Self-correction capabilities are essential for deploying LLMs in safety-critical applications. This study uncovered a systematic failure of LLMs to correct errors in their own output, a phenomenon known as "self-correction blind spots," where LLMs successfully correct identical errors in external sources while failing to correct them. To investigate this, we present the Self-Correction Bench, an evaluation framework that measures this phenomenon through controlled error injection at three complexity levels. Testing 14 open-source non-inference models, we found an average blind spot rate of 64.5%. Several lines of evidence suggest that this limitation may be influenced by training data. Specifically, human demonstrations rarely include error-correction sequences, whereas reinforcement learning (RL) trained models learn to correct errors through output feedback. Notably, adding a minimal "wait" prompt reduced blind spots by 89.3%, suggesting a potential capability requiring triggering. This study highlights important limitations that may be influenced by training distributions and presents practical approaches to improve the reliability of LLM.