Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing

Created by
  • Haebom

Author

Yukai Zhao, Menghan Wu, Xing Hu, Xin Xia

Outline

Large-Scale Language Models (LLMs) are widely used for code generation, but they pose a serious security risk when applied in production due to package hallucinations, which recommend non-existent packages. These hallucinations can be exploited by malicious attackers to register malicious packages, potentially leading to software supply chain attacks. This study highlights the importance of testing LLMs for package hallucinations to mitigate package hallucinations and defend against potential attacks. To this end, we propose HFUZZER, a novel syntax-based fuzzing framework. HFUZZER employs fuzzing techniques and generates sufficient and diverse coding tasks by inducing the model to infer a wider range of reasonable information based on syntax. Furthermore, it extracts syntax from package information or coding tasks to ensure the relevance of syntax and code, thereby enhancing the relevance of generated tasks and code. Evaluation results of HFUZZER on multiple LLMs showed that package hallucinations were induced in all selected models. Compared to mutation fuzzing frameworks, HFUZZER identified 2.60 times more unique hallucinated packages and generated a greater variety of tasks. Furthermore, when testing GPT-4o, HFUZZER discovered 46 unique hallucinatory packages. Further analysis revealed that for GPT-4o, LLM exhibits package hallucination not only when generating code but also when assisting with environment configuration.

Takeaways, Limitations

Takeaways:
A new framework, HFUZZER, is proposed to solve the package hallucination problem.
HFUZZER has been proven effective in inducing packaged hallucinations in various LLMs.
Even modern models like GPT-4o have confirmed the phenomenon of package hallucination.
Check for possible package hallucinations not only when generating code but also when supporting environment configuration.
Limitations:
Information about specific Limitations is not specified in the paper (based on the abstract)
👍