Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis

Created by
  • Haebom

Author

Omar A. Essameldin, Ali O. Elbeih, Wael H. Gomaa, Wael F. Elsersy

Outline

This paper addresses the problem of classifying 18 Arabic dialects spoken in 22 countries. Using the QADI Arabic tweet dataset, we build and test large-scale language models (LLMs) using RNNs, Transformer models, and prompt engineering. Using state-of-the-art preprocessing techniques and state-of-the-art NLP models, we identify the most important linguistic issues in Arabic dialect identification, and the MARBERTv2 model achieves the best performance with 65% accuracy and 64% F1-score.

Takeaways, Limitations

Takeaways:
We provide a performance evaluation and comparison of state-of-the-art NLP models for Arabic dialect identification.
It could contribute to a variety of applications, including personalized chatbots that respond to users' dialects, social media monitoring, and improved accessibility for Arabic speakers.
By highlighting key linguistic challenges in Arabic dialect identification, we suggest future research directions.
Limitations:
The accuracy (65%) and F1 score (64%) of the MARBERTv2 model are still not high, so there is room for improvement.
Further explanation is needed regarding the size and diversity of the dataset used (QADI).
There is a lack of analysis of possible bias or overfitting to specific dialects.
👍