This paper addresses the problem of classifying 18 Arabic dialects spoken in 22 countries. Using the QADI Arabic tweet dataset, we build and test large-scale language models (LLMs) using RNNs, Transformer models, and prompt engineering. Using state-of-the-art preprocessing techniques and state-of-the-art NLP models, we identify the most important linguistic issues in Arabic dialect identification, and the MARBERTv2 model achieves the best performance with 65% accuracy and 64% F1-score.