Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Depth-Guided Self-Supervised Human Keypoint Detection via Cross-Modal Distillation

Created by
  • Haebom

Author

Aman Anand, Elyas Rashno, Amir Eskandari, Farhana Zulkernine

Outline

Existing unsupervised keypoint detection methods apply artificial transformations, such as masking significant portions of the image or using the original image reconstruction as the learning objective. However, these approaches lack depth information and often detect keypoints in the background. To address this issue, we propose Distill-DKP, a novel cross-modal knowledge distillation framework that utilizes depth maps and RGB images to detect keypoints in a self-supervised manner. During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model, restricting inference to the student model. Experimental results demonstrate that Distill-DKP significantly outperforms existing unsupervised learning methods, reducing the average L2 error by 47.15% on the Human3.6M dataset, reducing the average error by 5.67% on the Taichi dataset, and improving keypoint accuracy by 1.3% on the DeepFashion dataset. A detailed ablation study demonstrates the sensitivity of knowledge distillation across different layers of the network.

Takeaways, Limitations

Takeaways:
We show that utilizing depth information can significantly improve the accuracy of unsupervised keypoint detection.
We present a method to effectively transfer knowledge from the teacher model to the student model through a cross-modal knowledge distillation framework.
Achieves superior performance compared to existing methods on Human3.6M, Taichi, and DeepFashion datasets.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Performance evaluation is required for various types of image data.
Further research is needed on the optimal layers and hyperparameter settings for knowledge distillation.
👍