[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

Created by
  • Haebom

Author

Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi, Sankalp KJ, Vinija Jain, Aman Chadha

Outline

This paper proposes a hierarchical prompt taxonomy (HPT) for effective evaluation of large-scale language models (LLMs). HPT is based on human cognitive principles and evaluates LLMs by examining the cognitive demands of various tasks. Using the Hierarchical Prompt Framework (HPF), five unique prompt strategies are hierarchically organized according to the level of cognitive demands, and the Hierarchical Prompt Index (HPI) is used to evaluate the complexity of tasks and the cognitive ability of LLMs. Experimental results using various datasets and LLMs show that HPF improves performance by 2% to 63% compared to the baseline performance, and GSM8k is the most cognitively complex task, recording an average HPI of 3.20. The implementations of HPT and HPF are open to the public to support future research and reproducibility.

Takeaways, Limitations

Takeaways:
Providing a new standard index (HPI) for evaluating the performance of LLM
Comprehensive assessment of LLM's cognitive abilities and the complexity of the dataset
Suggesting the possibility of improving LLM performance through HPF (2%~63%)
Validation of HPT through experimental results on various datasets and LLM
Ensuring reproducibility of research and supporting future research through code disclosure
Limitations:
Additional validation of the absolute scale of HPI is needed
Need to review the effectiveness of other strategies besides the five prompt strategies presented
Consider the possibility of bias in results for specific datasets and LLMs
Further research is needed on the generalizability of human cognitive principles.
👍