Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Created by
  • Haebom

Author

Kai Ye, Liangcai Su, Chenxiong Qian

Outline

This paper presents a study on the vulnerabilities of Search-Augmented Generation (RAG) in large-scale language model (LLM)-based code generation, specifically, malicious dependency hijacking attacks. We demonstrate the potential for exploiting LLM and developer trust by injecting malicious dependencies into RAG-based code generation (RACG) using malicious documents. To achieve this, we propose a novel attack framework, called ImportSnare, which incorporates position-aware beam search to manipulate the ranking of malicious documents and multilingual inductive suggestions to manipulate the LLM to recommend malicious dependencies. We experimentally demonstrate that ImportSnare achieves a high success rate (over 50% for popular libraries such as matplotlib and seaborn) across various languages, including Python, Rust, and JavaScript, and is effective even at a low toxicity rate (0.01%). This highlights the supply chain risks of LLM-based development and suggests the need for enhanced security in code generation. Multilingual benchmarks and datasets will be made public.

Takeaways, Limitations

Takeaways:
Clearly highlights the security vulnerabilities of LLM-based code generation, particularly the risk of malicious dependency hijacking when leveraging RAG.
Experimentally demonstrating the feasibility of an effective malicious dependency injection attack using the ImportSnare framework.
Emphasizes the need to strengthen supply chain security in LLM-based development environments.
Multilingual support and attack success rates for various programming languages.
Multilingual benchmarks and datasets will be released for future research.
Limitations:
The currently proposed attack presupposes a dependency on a specific malicious package. Research is needed to develop generalized attack techniques for various types of malicious activity.
Research on defense techniques against ImportSnare is lacking. Further research is needed on attack defense and mitigation strategies.
Further validation of attack success rates and effectiveness in real-world scenarios is needed.
👍