We propose Adaptive Computation Pruning (ACP) to improve the efficiency of the Forgetting Transformer (FoX). FoX improves performance over the traditional Transformer by introducing a forget gate to softmax attention, but many attention heads tend to forget information quickly. ACP addresses this issue by dynamically removing computations involving input-output dependencies strongly attenuated by the forget gate. It safely performs pruning through a dynamically set pruning threshold, and applying ACP to FoX in language model pretraining reduced FLOPs and memory accesses by approximately 70%. This resulted in a 50-70 % reduction in attention execution time (a 2-3x speedup) and a 10-40% increase in end-to-end training throughput. The computational savings are greater for longer contexts. We achieved this speedup without compromising performance.