Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Created by
  • Haebom

Author

Ira Ceka, Feitong Qiao, Anik Dey, Aastha Valecha, Gail Kaiser, Baishakhi Ray

Outline

This paper presents a novel approach for detecting vulnerabilities in partial code using large-scale language models (LLMs). Existing static analysis (SA) tools rely on manually written rules to detect vulnerabilities, but suffer from high error rates. This study proposes an alternative to these SA tools through LLM prompting, a prompting strategy that integrates natural language guidance and contrastive thought process reasoning. We enhance the prompting strategy by leveraging contrastive samples derived from a synthetic dataset, and demonstrate that leveraging state-of-the-art inference models such as DeepSeek-R1 achieves higher accuracy than static analysis techniques. Specifically, the proposed optimal strategy improves accuracy by up to 31.6%, improves F1 score by 71.7%, improves pairwise accuracy by 60.4%, and reduces false negative rate (FNR) by up to 37.6%.

Takeaways, Limitations

Takeaways:
We demonstrate that LLM prompting can be used to improve vulnerability detection performance in partial code.
Presenting an alternative that overcomes the limitations of existing static analysis tools that rely on manually written rules.
Effective prompting strategies can significantly improve the accuracy, F1 score, and pairwise accuracy of static analysis tools, and reduce false negative rates.
Limitations:
The performance of the proposed prompting strategy may depend on the LLM and dataset used.
Further validation of generalization performance in real-world applications is needed.
Because it relies on synthetic datasets, additional experiments using real vulnerability datasets are needed.
Dependency on a specific inference model (DeepSeek-R1) exists. Performance verification on various models is required.
👍