This paper addresses the problem that language models can retain dangerous knowledge and skills despite safe fine-tuning, leading to misuse and misalignment risks. We systematically evaluate key elements for irreversible unlearning, pointing out that existing unlearning methods are easily reversible. We evaluate components of existing and novel unlearning methods to identify the key elements for irreversible unlearning, and introduce the 'Disruption Masking' technique, which allows weight updates only when the unlearning gradient and the maintained gradient have the same sign, ensuring that all updates are non-destructive. We also verify the necessity of unlearning gradient regularization and the utility of meta-learning, and combine these insights to present MUDMAN (Meta-Unlearning, Disruption Masking, and Regularization). MUDMAN is proven to be effective in preventing dangerous feature recovery, and it shows a 40% improvement over the previous state-of-the-art TAR method, suggesting a new state-of-the-art for robust unlearning.