Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Created by
  • Haebom

Author

Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang

Outline

Large-scale language models are widely used, but they can unintentionally contain sensitive or harmful information, raising security concerns. Machine unlearning has emerged to address this issue, but existing training-time unlearning methods have limited ability to balance knowledge separation and removal with model utility. This paper proposes FALCON, a representation-based unlearning approach. FALCON enhances unlearning effectiveness, maintains model utility, and exhibits robust resistance to knowledge recovery attempts by utilizing information-theoretic guidance for efficient parameter selection, contrastive mechanism for representation separation, and projection of conflicting gradients into an orthogonal space to resolve conflicts between forgetting and retention objectives.

Takeaways, Limitations

Takeaways:
FALCON overcomes the limitations of existing unlearning methods through efficient parameter selection, representation separation, and conflict resolution.
Maintain model utility while improving unlearning effectiveness.
It exhibits strong resistance to knowledge recovery attempts.
Limitations:
The specific Limitations of this paper is not stated in the abstract.
👍