To explore the applicability of large-scale language models (LLMs) to cybersecurity, we evaluated the password guessing performance based on synthetic user profiles using open-source LLMs, including TinyLLaMA, Falcon-RW-1B, and Flan-T5. We measured the Hit@1, Hit@5, and Hit@10 metrics against plaintext and SHA-256 hashes. All models achieved less than 1.5% accuracy on Hit@10, significantly underperforming existing rule-based and combination-based cracking methods. We analyzed the key limitations of LLMs when applied to the specific domain task of password guessing and derived Takeaways.