This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper addresses the vulnerability of large-scale language models (LLMs) to backdoor attacks. Existing full-parameter fine-tuning (FPFT)-based backdoor attacks have high computational cost, and parameter-efficient fine-tuning (PEFT)-based attacks have the limitation that it is difficult to align trigger and target labels. In this paper, we propose a novel backdoor attack algorithm based on feature alignment-enhanced knowledge distillation (FAKD). A small-scale language model is used as a teacher model by implanting a backdoor using FPFT, and the backdoor is secretly propagated to a large-scale student model using PEFT through FAKD. Through theoretical analysis and experimental results, we show that FAKD can significantly enhance the effectiveness of PEFT-based backdoor attacks.
Takeaways, Limitations
•
Takeaways: A new method (FAKD) to improve the efficiency of PEFT-based LLM backdoor attacks is presented. The superiority of FAKD is verified through experiments on various models and attack algorithms. The importance of defense research against PEFT-based backdoor attacks is emphasized.
•
Limitations: Further research on the generalization performance of the proposed method is needed. Comprehensive evaluations on various types of LLM and backdoor attacks are additionally needed. Verification of applicability in real environments is needed.