This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Created by
Haebom
Author
Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng
Outline
This paper presents a comprehensive, expertly annotated benchmark for Chinese language harmful content detection. Given that existing harmful content detection resources are focused on English, while Chinese language datasets are scarce and limited in scope, we develop a benchmark covering six representative categories of real-world data. Through the annotation process, we generate an expert knowledge rule base to support Chinese harmful content detection in LLMs. We then propose a knowledge augmentation baseline model that integrates human-annotated knowledge rules with the LLMs' implicit knowledge, enabling a small model to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench .