This paper highlights that the persuasive power of large-scale language models (LLMs) raises both beneficial applications (e.g., smoking cessation support) and serious risks (e.g., large-scale targeted political manipulation). While previous research has found significant and increasing persuasive power by measuring belief changes in simulated or real users, it has overlooked a crucial risk factor: the model's tendency to attempt persuasion in harmful contexts. This paper proposes Attempt to Persuade Evaluation (APE), a novel benchmark that focuses on persuasion attempts rather than persuasion success. APE utilizes a multi-round dialogue setting between simulated persuaders and persuaded agents, exploring a variety of topics, including conspiracies, controversial issues, and non-controversially harmful content. An automatic evaluation model is introduced to identify persuasive intent and measure the frequency and context of persuasion attempts. We find that diverse LLMs frequently demonstrate a willingness to attempt persuasion on harmful topics, and that jailbreaking can increase this willingness. The results highlight a gap in current safeguards and emphasize that assessing persuasive intent is a key dimension of LLM risk assessment. APE is available at github.com/AlignmentResearch/AttemptPersuadeEval에서.