Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation

Created by
  • Haebom

Author

Martin Weyssow, Chengran Yang, Junkai Chen, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Huu Hung Nguyen, Yan Naing Tun, Tan Bui, Yikun Li, Ang Han Wei, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, David Lo

Outline

This paper proposes R2Vul, a novel method for detecting software vulnerabilities. R2Vul combines Reinforcement Learning-Based AI Feedback (RLAIF) and Structured Inference Distillation to train small-code LLMs to detect vulnerabilities and generate security-conscious explanations. Unlike existing thought processes and directive tuning methods, R2Vul rewards well-founded explanations over plausible but unfounded ones through RLAIF, resulting in more accurate detections and higher-quality inferences. To support RLAIF, we built the first multilingual vulnerability detection preference dataset, consisting of 18,000 high-quality samples from C#, JavaScript, Java, Python, and C. Across five programming languages, we compared the performance of four static analysis tools, eight state-of-the-art LLM-based baseline models, and various fine-tuning methods. We demonstrate that the 1.5 billion-parameter R2Vul model outperforms a 32 billion-parameter teacher model and leading commercial LLMs such as Claude-4-Opus. Additionally, we introduced a lightweight correction step to reduce the false positive rate under various imbalanced data distributions. Finally, qualitative analysis shows that both LLM and human raters consistently rated the R2Vul model's inferences higher than other inference-based baseline models.

Takeaways, Limitations

Takeaways:
We present a novel vulnerability detection method R2Vul that combines RLAIF and structural inference distillation.
Achieving performance that surpasses large models even with small models.
Multilingual support and applicability to various programming languages.
A lightweight correction step is proposed to reduce the false positive rate.
High-quality inference generation and accurate vulnerability detection.
Building the first multilingual vulnerability detection preference dataset.
Limitations:
The size and diversity of the presented multilingual dataset leaves room for future improvements.
There may be bias towards certain programming languages or vulnerability types.
Further research is needed on generalization performance in real-world environments.
The effect of the correction step may vary depending on the data distribution.
👍