Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs

Created by
  • Haebom

Author

Shuai Zhao, Leilei Gan, Zhongliang Guo, Xiaobao Wu, Yanhao Jia, Luwei Xiao, Cong-Duy Nguyen, Luu Anh Tuan

Outline

This paper addresses the vulnerability of large-scale language models (LLMs) to backdoor attacks. Existing full-parameter fine-tuning (FPFT)-based backdoor attacks have high computational cost, and parameter-efficient fine-tuning (PEFT)-based attacks have the limitation that it is difficult to align trigger and target labels. In this paper, we propose a novel backdoor attack algorithm based on feature alignment-enhanced knowledge distillation (FAKD). A small-scale language model is used as a teacher model by implanting a backdoor using FPFT, and the backdoor is secretly propagated to a large-scale student model using PEFT through FAKD. Through theoretical analysis and experimental results, we show that FAKD can significantly enhance the effectiveness of PEFT-based backdoor attacks.

Takeaways, Limitations

Takeaways: A new method (FAKD) to improve the efficiency of PEFT-based LLM backdoor attacks is presented. The superiority of FAKD is verified through experiments on various models and attack algorithms. The importance of defense research against PEFT-based backdoor attacks is emphasized.
Limitations: Further research on the generalization performance of the proposed method is needed. Comprehensive evaluations on various types of LLM and backdoor attacks are additionally needed. Verification of applicability in real environments is needed.
👍