Large-scale inference models (LRMs) trained using reinforcement learning demonstrate advanced inference capabilities, but are vulnerable to security threats. In particular, they are vulnerable to adversarial attacks, such as backdoor prompt attacks, during the Chain-of-Thought (CoT) generation process. CoT attacks (CoTA) exploit prompt controllability to degrade CoT security and operational performance. This paper proposes Thought Purity (TP), a defense framework for CoTA vulnerabilities. TP strengthens resistance to malicious content and maintains operational efficiency through three components: a safety-optimized data processing pipeline, reinforcement learning-based rule constraints, and adaptive monitoring metrics.