Despite advances in the complex problem-solving capabilities of large-scale reasoning models (LRMs), this paper highlights the potential for harmful content to be included in the Chain of Terror (CoT) inference process, persisting even when the final response appears safe. This paper highlights the potential for harmful content to be included in existing methods that overlook the importance of safe inference, as well as the potential risks associated with exposure to malicious users. We focus on aligning safe inference itself. To this end, we analyze the characteristics of safe inference and identify the importance of safety triggers, compliance signals, and corrective interventions. We propose a novel alignment method, Intervention Preference Optimization (IPO), which enhances safe inference by replacing compliance steps with safety triggers and constructing pairs for preference learning. Experimental results on jailbreak and adversarial safety benchmarks demonstrate that IPO significantly improves overall safety in both inference and response, reducing harmful content by more than 30% compared to SFT and RL-based models, while maintaining superior performance across a variety of inference tasks.