This paper proposes a hierarchical prompt taxonomy (HPT) for effective evaluation of large-scale language models (LLMs). HPT is based on human cognitive principles and evaluates LLMs by examining the cognitive demands of various tasks. Using the Hierarchical Prompt Framework (HPF), five unique prompt strategies are hierarchically organized according to the level of cognitive demands, and the Hierarchical Prompt Index (HPI) is used to evaluate the complexity of tasks and the cognitive ability of LLMs. Experimental results using various datasets and LLMs show that HPF improves performance by 2% to 63% compared to the baseline performance, and GSM8k is the most cognitively complex task, recording an average HPI of 3.20. The implementations of HPT and HPF are open to the public to support future research and reproducibility.