This paper focuses on counterspeech (refutation) as a strategy to counter harmful online content (conspiracy theories). Because expert-driven counterspeech struggles to scale, we propose a method utilizing large-scale language models (LLMs). However, we highlight the lack of a counterspeech dataset for conspiracy theories. We evaluate the counterspeech generation capabilities of GPT-4o, Llama 3, and Mistral models using structured prompts based on psychological research. Experimental results show that these models tend to produce generic, repetitive, and superficial results, overemphasize fear, and fabricate facts, sources, and figures. This suggests that prompt-based approaches pose challenges for practical application.