Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ModelCitizens: Representing Community Voices in Online Safety

Created by
  • Haebom

Author

Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel

Outline

This paper addresses the Limitations of automatic harmful language detection, which is essential for creating safe and inclusive online spaces. Existing models tend to reduce the perspectives of diverse annotators to a single ground truth, ignoring contextual notions of harmfulness (e.g., recycled language). To address this, we present the MODELCITIZENS dataset, which consists of 6.8K social media posts and 40K harmful annotations across diverse identity groups. To capture the role of conversational context as a feature of social media posts, we augment MODELCITIZENS posts with LLM-generated conversation scenarios. State-of-the-art harmfulness detection tools (e.g., OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, and even worsen on posts with added context. Finally, we present the LLaMA-based LLAMACITIZEN-8B and Gemma-based GEMMACITIZEN-12B models fine-tuned on MODELCITIZENS, which outperform GPT-o4-mini by 5.5% in on-distribution evaluation. These results highlight the importance of community-driven annotation and modeling for inclusive content moderation. Data, models, and code are available at https://github.com/asuvarna31/modelcitizens .

Takeaways, Limitations

Takeaways:
Emphasizes the importance of the MODELCITIZENS dataset, which takes into account diverse identity groups and contexts.
Reveals the limitations of existing hazard detection models.
Suggesting the need for inclusive content moderation through community-driven annotation and modeling.
LLAMACITIZEN-8B and GEMMACITIZEN-12B models are available, demonstrating improved performance.
Presenting the potential for continued research advancement through open datasets, models, and code.
Limitations:
MODELCITIZENS The need to further expand the size of the dataset.
There is a need to evaluate the generalization performance of the model to different types of harmful language.
Further review is needed of the realism and diversity of the conversation scenarios generated by the LLM.
There is a need to assess bias in specific language or cultural contexts and find solutions.
👍