Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs

Created by
  • Haebom

Author

Zhiyang Chen, Tara Saba, Xun Deng, Xujie Si, Fan Long

Outline

This paper addresses the security of large-scale language models (LLMs), specifically assessing their risk of generating malicious content. We develop an automated audit framework called "Scam2Prompt" to identify the intent of scam sites and generate benign, developer-style prompts that mimic their intent, testing whether LLMs generate malware. A large-scale study of four major LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3) revealed that malicious URLs were generated in 4.24% of cases. Furthermore, testing seven additional LLMs released in 2025 using "Innoc2Scam-bench" revealed malware generation rates ranging from 12.7% to 43.8%. Existing safeguards are found to be inadequate to protect against these vulnerabilities.

Takeaways, Limitations

Takeaways:
LLM has serious security vulnerabilities that allow it to learn malicious content and generate malicious code even with innocuous prompts.
Scam2Prompt and Innoc2Scam-bench are effective methodologies for assessing these vulnerabilities and identifying prompts that trigger malware generation.
Existing guardrails are ineffective in preventing LLM from generating malicious code.
Limitations:
The study may be limited to specific LLMs and prompt types, and further research is needed to generalize to all LLMs.
Malware generation rates may fluctuate depending on ongoing updates and improvements to LLM.
Scam2Prompt's malware generation prompt generation process may not be perfect and needs further improvement.
👍