Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

Created by
  • Haebom

Author

Guillermo Villate-Castillo, Javier Del Ser, Borja Sanz

Outline

This paper presents a novel framework that leverages annotation inconsistencies in content moderation. Existing content moderation systems combine human moderators with machine learning models, but tend to treat annotation inconsistencies as noise. This paper interprets these inconsistencies as valuable signals revealing content ambiguity and presents an approach that simultaneously learns toxicity classification and annotation inconsistencies through multi-task learning. Specifically, it leverages conformal prediction to account for annotation ambiguity and model uncertainty, providing moderators with the flexibility to adjust thresholds for annotation inconsistencies. Experimental results show that the proposed framework improves model performance, calibration, and uncertainty estimation compared to single-task approaches, increases parameter efficiency, and enhances the review process.

Takeaways, Limitations

Takeaways:
We demonstrate that annotation inconsistencies can be leveraged as valuable information in content moderation to improve model performance.
We propose that combining multi-task learning and uncertainty estimation techniques can lead to a more accurate and reliable content moderation system.
Improve the content review process and increase efficiency by giving moderators flexibility.
Improved parameter efficiency allows for more efficient use of system resources.
Limitations:
Further research is needed to determine the generality of the proposed framework and its applicability to various content types.
Further research is needed on optimization strategies for threshold setting for annotation mismatch.
Performance evaluation and scalability review in actual service environments are required.
👍