This paper aims to fundamentally understand the phenomenon of modality collapse observed in multimodality fusion models. Modality collapse occurs when noisy features from one modality become entangled with the predictive features of other modalities through shared neurons in the fusion head. This obscures the positive contribution of the first modality to the predictive features, leading to modality collapse. We demonstrate that cross-modal knowledge distillation decouples these representations by alleviating the rank bottleneck in the student encoder and removes noise from the fusion head output without negatively affecting the predictive features of any modality. Based on these results, we propose an algorithm that prevents modality collapse through explicit basis reassignment, demonstrating its applicability to handling missing modalities. We validate our theoretical arguments through extensive experiments on various multimodality benchmarks.