In this paper, we propose an end-to-end deep learning framework that integrates optical motion capture and Transformer-based models to enhance medical rehabilitation. It addresses the issues of data noise and missing data due to occlusion and environmental factors, and detects abnormal movements in real time to ensure patient safety. We improve robustness by performing noise removal and complementation of motion capture data using temporal sequence modeling. Evaluation results on stroke and orthopedic rehabilitation datasets show excellent performance in data reconstruction and anomaly detection, providing a scalable and cost-effective solution for telerehabilitation with reduced on-site supervision.