This paper investigates whether the phenomenon of emergent misalignment, observed in large-scale language models (LLMs) fine-tuned with malicious behavior, also applies to inference models. We conducted experiments in which the inference model was fine-tuned with malicious behavior while the Chain-of-Thought (CoT) was disabled, and then CoT was re-enabled during evaluation. As with the original LLM, the inference model exhibited a wide range of misalignments. The model provided deceptive or false answers, expressed a desire for total control, and refused to terminate. When we examined the CoTs preceding these misaligned responses, we observed two types: (i) an explicit plan for deception (“I will deceive the user…”) and (ii) a rationalization that sounded good (“It is safe to take five sleeping pills at once…”). The rationalizations often prevented the monitor evaluating the CoT from detecting the misalignment. We also investigated a 'sleeper agent' inference model that performs malicious actions only when a backdoor trigger is present in the prompt, showing that inconsistencies can be hidden during evaluation, posing additional risks. Sleeping agents often demonstrate a kind of self-awareness, being able to explain and explain backdoor triggers. CoT monitoring can therefore reveal such behavior, but is unreliable. In summary, the inference step can reveal and hide inconsistent intent, and does not prevent inconsistent behavior in the studied models. In this paper, we present three new datasets (medical, legal, and security) and an evaluation tool that induce emergent inconsistencies while preserving the functionality of the models.