Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Created by
  • Haebom

Author

Xiaoyu Xu, Xiang Yue, Yang Liu, Qingqing Ye, Huadi Zheng, Peizhao Hu, Minxin Du, Haibo Hu

Outline

The effectiveness of unlearning, which removes specific data from large-scale language models (LLMs), is typically evaluated using task-level metrics such as accuracy or confusion. However, we demonstrate that these metrics can be misleading. While the model may appear to have forgotten, its original behavior can be easily restored with minimal fine-tuning. This "reversibility" phenomenon suggests that information is suppressed rather than actually deleted. To address this issue, we introduce a "representation-level analysis framework" that incorporates PCA-based similarity and shift, centered kernel alignment (CKA), and Fisher information. Through this, we identify four distinct forgetting regimes in terms of reversibility and catastrophic forgetting. Our analysis reveals that achieving the ideal state (irreversible and non-catastrophic forgetting) is extremely difficult. By exploring the limitations of unlearning, we identify cases of seemingly irreversible and targeted forgetting, providing new insights for designing more robust deletion algorithms.

Takeaways, Limitations

It exposes a fundamental gap in the current evaluation method.
Building a representational foundation for reliable unlearning.
Identification of four forgetting regimes based on reversibility and catastrophism.
Difficulty achieving an ideal state of unlearning (irreversible, non-catastrophic forgetting).
A case of seemingly irreversible, targeted forgetting discovered.
👍