This study presents an automatic region-of-interest (ROI) detection system utilizing deep learning and explainable AI (xAI) techniques to improve the efficiency and objectivity of MRI interpretation, which is essential for evaluating knee injuries. Various deep learning architectures, including ResNet50, InceptionV3, Vision Transformers (ViT), and various U-Net variants, were evaluated using supervised and self-supervised learning methods. xAI techniques, such as Grad-CAM and Saliency Maps, were integrated to enhance interpretability. Performance was evaluated using area under the curve (AUC) (classification), PSNR/SSIM (reconstruction quality), and qualitative ROI visualization. ResNet50 demonstrated superior classification and ROI identification performance over Transformer-based models on the MRNet dataset. A combined U-Net + MLP model showed potential for improved reconstruction and interpretability but lower classification performance, while Grad-CAM provided the most clinically meaningful explanations among all architectures. In conclusion, CNN-based transfer learning was most effective on this dataset, and future performance enhancements to Transformer models through large-scale pretraining are expected.