Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Opioid Named Entity Recognition (ONER-2025) from Reddit

Created by
  • Haebom

Author

Muhammad Ahmad, Rita Orji, Fida Ullah, Ildar Batyrshin, Grigori Sidorov

Outline

This paper proposes analyzing unstructured data from social media platforms like Reddit as a solution to the opioid overdose crisis, a serious public health problem in the United States. Drawing on Reddit user data sharing their experiences with opioid use, we extract information using a natural language processing (NLP) technique leveraging Opioid Named Entity Recognition (ONER-2025). We build a unique, manually annotated dataset of 331,285 tokens and detail the annotation process and challenges associated with it, encompassing eight key opioid entity categories. Furthermore, we analyze linguistic challenges in opioid-related discussions, such as slang, ambiguity, fragmented sentences, and emotionally charged language. We propose a real-time monitoring system that integrates machine learning, deep learning, Transformer-based language models, and advanced contextual embeddings. In 11 experiments conducted with 5-fold cross-validation, Transformer-based models such as bert-base-NER and roberta-base achieved 97% accuracy and F1-score, which is 10.23% better performance than the baseline model (RF=0.88).

Takeaways, Limitations

Takeaways:
Presenting the possibility of developing an opioid overdose crisis monitoring and prevention system utilizing social media data.
Demonstrating the effectiveness of extracting and analyzing opioid-related information through the construction of the ONER-2025 dataset and application of NLP techniques.
The superior performance of transformer-based models demonstrates the potential for improving the accuracy of real-time monitoring systems.
Suggesting future research directions through analysis of linguistic features related to opioids.
Limitations:
The need for review of bias and generalizability of Reddit data.
Further research is needed on the practical application and effectiveness of real-time monitoring systems.
Possible degradation of model performance due to limitations in dataset size and diversity.
Further research is needed to explore the applicability of this study to other social media platforms or data sources.
👍