Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Created by
  • Haebom

Author

Shinwoo Park, Shubin Kim, Do-Kyung Kim, Yo-Sub Han

Outline

This paper presents a new benchmark dataset, KatFish, and a detection model, KatFishNet, for detecting Korean texts generated by large-scale language models (LLMs). Unlike previous studies that mainly focused on English, we propose a text generation detection method that is suitable for Korean characteristics by considering Korean's unique morphological analysis, word order, and punctuation patterns. The KatFish dataset consists of human-written texts and four LLM-generated texts in three genres, and KatFishNet achieves an average of 19.78% higher AUROC performance than the previous best-performing models. We expect that the open code and data will contribute to the research on Korean LLM-generated text detection.

Takeaways, Limitations

Takeaways:
First benchmark dataset (KatFish) for Korean LLM generated text detection
Presentation of a new detection model (KatFishNet) considering Korean features and verification of excellent performance
Presentation of a method for detecting LLM-generated text using linguistic features such as morphological analysis, word order, and punctuation use in Korean
Contributes to maintaining academic integrity, preventing plagiarism, protecting copyright, and ensuring ethical research practices.
Limitations:
Currently, the KatFish dataset is limited to specific LLMs and genres, so verification of generalization performance for various LLMs and genres is required.
There is a possibility that KatFishNet's performance may deteriorate as new LLMs emerge and evolve.
There may be a lack of consideration for the various dialects and writing styles of the Korean language.
👍