Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

CPCLDETECTOR: Knowledge Enhancement and Alignment Selection for Chinese Patronizing and Condescending Language Detection

Created by
  • Haebom

Author

Jiaxun Yang, Yifei Han, Long Zhang, Yujie Liu, Bin Li, Bo Gao, Yangfan He, Kejia Zhan

Outline

This paper focuses on Chinese Overprotective and Derogatory Language (CPCL), a type of implicitly discriminatory and harmful language targeting vulnerable groups on Chinese video platforms. To address the lack of existing datasets, which cannot accurately understand video content and fail to detect some CPCL videos, we build a new dataset, PCLMMPLUS, containing 103,000 comment entries, and propose the CPCLDetector model, which features alignment selection and knowledge-enhanced comment content modules. Experimental results show that the proposed CPCLDetector outperforms existing state-of-the-art (SOTA) performance and achieves higher performance on PCLMMPLUS, contributing to content moderation and the protection of vulnerable groups by more accurately detecting CPCL videos. The code and dataset are available on GitHub.

Takeaways, Limitations

Takeaways:
Building and releasing a new dataset, PCLMMPLUS, for CPCL detection on Chinese video platforms.
A new model CPCLDetector is proposed to improve CPCL detection performance.
Experimentally verified improved CPCL detection performance compared to existing SOTA models.
Contribute to content management and protection of vulnerable populations.
Limitations:
The size of the dataset PCLMMPLUS may still be limited.
It is unlikely that we will be able to capture all the different expressions of CPCL.
Further research is needed on the model's generalization performance.
👍