This paper compares the performance of deep learning (DL) and classical machine learning (ML) algorithms for classifying 24-hour movement behavior into sleep, sedentary activity, low-intensity physical activity (LPA), and moderate-to-vigorous physical activity (MVPA). We used publicly available data from 151 adults wearing wrist-worn accelerometers (Axivity-AX3). Participants were randomly divided into training, validation, and test sets. Raw acceleration signals were segmented into non-overlapping 10-second intervals, and a total of 104 handcrafted features were extracted. Four DL algorithms—LSTM, BiLSTM, GRU, and 1D-CNN—were trained using the raw acceleration signals and the extracted handcrafted features. Furthermore, classical ML algorithms, including Random Forest, Support Vector Machine (SVM), XGBoost, Logistic Regression, ANN, and Decision Tree—were trained using the handcrafted features. As a result, LSTM, BiLSTM, and GRU trained using raw acceleration signals achieved an accuracy of approximately 85%, and 1D-CNN achieved an accuracy of approximately 80%. The accuracies of DL and classical ML algorithms trained using handcrafted features ranged from 70% to 81%. There was more confusion in the classification of MVPA and LPA compared to sleep and sedentary activity. In conclusion, DL methods using raw acceleration signals performed slightly better than DL and classical ML trained using handcrafted features in predicting 24-hour movement activity intensity.