Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Analyzing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data

Created by
  • Haebom

Author

Adel ElZemity, Budi Arief, Shujun Li

Outline

This paper addresses the security challenges of large-scale language models (LLMs) used in cybersecurity. Building on previous research showing that fine-tuning LLMs with fake malicious cybersecurity data significantly reduces their security, we evaluated four open-source LLMs—Mistral 7B, Llama 3 8B, Gemma 2 9B, and DeepSeek R1 8B—using different evaluation frameworks (the Garak red teaming framework and the OWASP Top 10 for LLM Applications). The evaluation results show that fine-tuning degrades the security of all LLMs (e.g., Mistral 7B's prompt injection failure rate increased from 9.1% to 68.7%). We propose and evaluate a novel security alignment method that restructures instruction-response pairs to incorporate explicit security precautions and ethical considerations. This demonstrates that model security can be maintained or improved while maintaining technical usability, offering practical directions for developing more secure fine-tuning methodologies.

Takeaways, Limitations

Takeaways:
We reaffirm that fine-tuning LLM with fake malicious cybersecurity data significantly reduces its security, and we quantitatively measure the extent of this degradation.
We propose a novel safety alignment approach to improve the safety of LLM and verify its effectiveness.
We present practical methods to improve the safety of LLM while maintaining its technical usability.
Limitations:
The types and sizes of LLMs used in the evaluation are limited.
Further research is needed to explore the generality and scalability of the proposed safety alignment approach.
Additional security assessments are needed for various real-world attack scenarios.
👍