Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Created by
  • Haebom

Author

Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak, Tapadhir Das

Outline

This paper analyzes the impact of adversarial attacks on model interpretability in text classification problems. We develop a machine learning-based classification model for text data, introduce adversarial perturbations, and evaluate classification performance after the attack. We analyze and interpret the model's explainability before and after the attack. This is part of a study examining the vulnerability of deep learning models to adversarial attacks, which can have serious consequences in areas such as autonomous driving, medical diagnosis, and security systems.

Takeaways, Limitations

Takeaways: By quantitatively analyzing the impact of adversarial attacks on the performance and interpretability of text classification models, we can suggest ways to improve the model's security and reliability. In particular, analyzing changes in the model's explainability can help identify vulnerabilities and improve defense strategies.
Limitations: The generalizability of the dataset and model used in this paper needs to be reviewed. Experiments on various datasets and models should be conducted to confirm the generalizability of the results. Furthermore, analysis of various types of adversarial attacks may be lacking, and research on more powerful attack techniques is needed. Finally, clear definitions and criteria for quantitative evaluation metrics of interpretability are needed.
👍