This paper highlights the growing need for an intelligent framework for human-AI collaboration in social reasoning games, particularly Werewolf. While previous studies have demonstrated that LLMs outperform humans in Werewolf, they point out latency issues due to their reliance on external modules and their limited academic scope. Therefore, in this paper, we propose "Verbal Werewolf," a novel Werewolf game system that leverages state-of-the-art LLMs and a fine-tuned TTS module to enable near-real-time gameplay. By leveraging the enhanced inference capabilities of LLMs, such as DeepSeek V3, without the need for external decision-making modules, we aim to deliver a more immersive and human-like gaming experience that significantly increases user engagement compared to existing text-based frameworks.