[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Invisible Textual Backdoor Attacks based on Dual-Trigger

Created by
  • Haebom

Author

Yang Hou, Qiuling Yue, Lujia Chai, Guozhao Liao, Wenbao Han, Wei Ou

Outline

This paper addresses the important security threat of backdoor attacks on text-based large-scale language models (LLMs). Existing single-trigger based text backdoor attack methods have the problems of being easily identified by defense strategies and having limitations in attack performance and malicious dataset construction. To solve this problem, this paper proposes a dual-trigger backdoor attack method that utilizes two different attributes, such as syntax and legal text (conditional clauses), as triggers. This method improves the flexibility of the trigger method and enhances the robustness of defense detection by simultaneously having completely different trigger conditions, just like setting two landmines. Experimental results show that the proposed method significantly outperforms existing abstract feature-based methods and achieves attack performance that is almost similar to that of insertion-based methods (almost 100% success rate). Furthermore, we present a malicious dataset construction method to improve attack performance. The code and data can be found at https://github.com/HoyaAm/Double-Landmines .

Takeaways, Limitations

Takeaways:
Presentation of a dual-trigger backdoor attack technique that overcomes the limitations of the existing single-trigger method.
Improved attack success rate and robustness against defensive techniques.
A method for generating an effective malicious dataset is presented.
To increase understanding of security vulnerabilities in text-based LLMs and promote the development of defense mechanisms.
Limitations:
Further studies are needed to determine whether the proposed dual-trigger approach is effective against all types of LLMs and defense mechanisms.
Further validation of applicability and generalization performance in real environments is needed.
Possible degradation of generalization performance for other triggers due to dependency on specific triggers (phrases, legal provisions).
👍