This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Created by
Haebom
Author
Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng
Outline
This paper presents a comprehensive, expertly annotated benchmark for Chinese language harmful content detection. To address the challenges of existing harmful content detection resources being focused on English, while Chinese language datasets are scarce and limited in scope, we develop a benchmark that incorporates six representative categories of real-world data. Through the annotation process, we establish an expert knowledge rule base to support Chinese language harmful content detection in LLMs. We then propose a knowledge augmentation baseline model that integrates human-annotated knowledge rules with the implicit knowledge of LLMs, enabling a small model to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench .