[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Created by
  • Haebom

Author

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee

Outline

This paper emphasizes the need for robust benchmarks for large-scale language models (LLMs) that cover both academic and industrial domains to effectively assess their real-world applicability. To this end, we present two Korean expert-level benchmarks: KMMLU-Redux, a reconstructed version of the original KMMLU that enhances its reliability, and KMMLU-Pro, which reflects Korean expert knowledge based on the Korean Professional Licensing Examination. KMMLU-Redux is constructed by removing errors from the Korean National Technical Qualification Examination, while KMMLU-Pro is based on the Korean Professional Licensing Examination. Experimental results show that these benchmarks comprehensively represent Korean industrial knowledge, and we make their corresponding datasets public.

Takeaways, Limitations

Takeaways:
Providing a new benchmark for assessing the industry applicability of Korean LLMs
Improved reliability of existing KMMLU and expanded specialized field benchmarks
Activating research through the release of a dataset that comprehensively reflects Korean industrial knowledge
Limitations:
Since the benchmark is specialized in Korean industrial knowledge, generalization to other countries or regions may be limited.
Because it is based on a national exam, it may not perfectly reflect the various situations in actual industrial settings.
Detailed description of the error removal process in KMMLU-Redux may be lacking (additional information needed)
👍