Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

Created by
  • Haebom

Author

Hariprasath Govindarajan, Maciej K. Wozniak, Marvin Klingner, Camille Maurice, B Ravi Kiran, Senthil Yogamani

Outline

This paper proposes CleverDistiller, a novel self-supervised learning-based cross-modal knowledge distillation (KD) framework for transferring generalized features from 2D image-based Vision Foundation Models (VFMs) to 3D LiDAR-based models. Unlike previous studies that employ complex loss functions, pseudo-semantic maps, and knowledge transfer limited to semantic segmentation, CleverDistiller learns complex semantic dependencies through simple yet effective design choices and enables direct knowledge transfer from VFMs without pseudo-semantic maps. Furthermore, it introduces an auxiliary self-supervised spatial task called occupancy prediction to enhance 3D spatial reasoning capabilities based on semantic knowledge acquired from VFMs. Experimental results on autonomous driving benchmarks demonstrate that CleverDistiller achieves state-of-the-art performance in both semantic segmentation and 3D object detection, with performance gains becoming more pronounced when fine-tuned with limited data.

Takeaways, Limitations

Takeaways:
We present a simple and effective self-supervised learning-based cross-modal knowledge distillation framework to effectively transfer knowledge from 2D VFM to 3D LiDAR models.
Direct knowledge transfer is possible without any semantic guidance.
Excellent performance improvement effect when fine-tuning with small data.
Achieving state-of-the-art performance in both semantic segmentation and 3D object detection.
Excellent performance without complex loss function design.
Limitations:
Further analysis of the generalization performance of the proposed method is needed.
Additional experiments with different LiDAR sensors and datasets are needed.
Applicability and performance evaluation for other types of VFM is needed.
👍