This paper addresses the security of large-scale language models (LLMs), specifically assessing their risk of generating malicious content. We develop an automated audit framework called "Scam2Prompt" to identify the intent of scam sites and generate benign, developer-style prompts that mimic their intent, testing whether LLMs generate malware. A large-scale study of four major LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3) revealed that malicious URLs were generated in 4.24% of cases. Furthermore, testing seven additional LLMs released in 2025 using "Innoc2Scam-bench" revealed malware generation rates ranging from 12.7% to 43.8%. Existing safeguards are found to be inadequate to protect against these vulnerabilities.