This paper highlights the limitations of existing robustness assessments of machine reading comprehension (MRC) models, which primarily rely on artificial perturbations. We propose a novel framework for assessing the robustness of MRC models based on naturally occurring text perturbations, leveraging Wikipedia edit history. Experiments on the SQUAD dataset and various model architectures demonstrate that natural perturbations degrade the performance of pre-trained encoder language models, and that even state-of-the-art Flan-T5 and large-scale language models (LLMs) exhibit these errors. Furthermore, we demonstrate that robustness can be improved using data trained with either natural or artificial perturbations, but that performance gaps still exist compared to unperturbed data.