This paper proposes a spatiotemporal deep learning framework for detecting gait abnormalities in cattle using publicly available video data. We build and release a balanced dataset consisting of 50 video clips of 42 cattle, and employ data augmentation techniques to train and evaluate two deep learning models: a 3D CNN and a ConvLSTM2D model. The 3D CNN model achieves 90% video-level classification accuracy and 90.9% precision, recall, and F1 score, outperforming the ConvLSTM2D model (85% accuracy). Unlike existing methods that rely on multi-stage pipelines for object detection and pose estimation, this study demonstrates the effectiveness of a direct end-to-end video classification approach. This approach effectively extracts and learns spatiotemporal features from diverse video sources, enabling scalable and efficient detection of cattle gait abnormalities in real-world farm environments.