This paper evaluates the strategic reasoning capabilities of agents based on large-scale language models (LLMs), particularly in game-theoretic situations. Three agent designs—a simple game-theoretic model, an LLM-only agent, and an LLM integrated into a conventional agent framework—are evaluated in a guessing game and compared with human participants. Generalization beyond the training distribution is also assessed using obfuscated game scenarios. Analyzing over 2,000 inference samples across 25 agent configurations, we demonstrate that designs that mimic human cognitive architecture can improve the consistency of LLM agents with human strategic behavior. However, we find that the relationship between agent design complexity and human-likeness is nonlinear, relying heavily on the performance of the underlying LLM and the limitations of simple structural augmentation.