Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

Created by
  • Haebom

Author

Yulong Wu, Viktor Schlegel, Riza Batista-Navarro

Outline

This paper highlights the limitations of existing robustness assessments of machine reading comprehension (MRC) models, which primarily rely on artificial perturbations. We propose a novel framework for assessing the robustness of MRC models based on naturally occurring text perturbations, leveraging Wikipedia edit history. Experiments on the SQUAD dataset and various model architectures demonstrate that natural perturbations degrade the performance of pre-trained encoder language models, and that even state-of-the-art Flan-T5 and large-scale language models (LLMs) exhibit these errors. Furthermore, we demonstrate that robustness can be improved using data trained with either natural or artificial perturbations, but that performance gaps still exist compared to unperturbed data.

Takeaways, Limitations

Takeaways:
We overcome the limitations of MRC model robustness assessment that relies on existing artificial perturbation methods and propose a new assessment framework that utilizes natural perturbations.
We have experimentally demonstrated that even state-of-the-art MRC models are vulnerable to natural text perturbations.
We suggest that model robustness can be improved by training with natural or artificially perturbed data.
Limitations:
Natural perturbations based on Wikipedia edit history are focused on a specific type of perturbation, and generalizability to other types of natural perturbations requires further study.
Training methods proposed to improve robustness to natural perturbations still fail to fully bridge the performance gap with unperturbed data.
It may not fully encompass the diverse textual perturbations of the real world.
👍