Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SpectralEarth: Training Hyperspectral Foundation Models at Scale

Created by
  • Haebom

Author

Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, Xiao Xiang Zhu

Outline

This paper introduces SpectralEarth, a large-scale, multi-perspectral hyperspectral image dataset leveraging data from the Environmental Mapping and Analysis Program (EnMAP). SpectralEarth comprises 538,974 image patches (415,153 unique locations) collected from 11,636 globally distributed EnMAP scenes, 17.5% of which contain multiple time stamps, enabling multi-perspectral analysis. In this paper, we pretrain hyperspectral-based models on SpectralEarth using state-of-the-art self-supervised learning algorithms and integrate a spectral adapter into an existing vision backbone to accommodate the unique characteristics of HSI. Furthermore, we build nine downstream datasets for land cover, crop type mapping, and tree species classification to provide benchmarks for model evaluation. Experimental results demonstrate the model's versatility and generalization performance across a variety of tasks and sensors, highlighting its computational efficiency during model fine-tuning.

Takeaways, Limitations

Takeaways:
Contributing to the advancement of hyperspectral-based model research by providing SpectralEarth, a globally representative large-scale multi-perspectral hyperspectral image dataset.
We present a hyperspectral-based model pre-training method based on self-supervised learning and verify its excellent performance in various downstream tasks.
A method to improve the computational efficiency of model fine-tuning is presented.
Providing benchmark datasets for various downstream tasks.
Limitations:
Relying solely on EnMAP data may limit the diversity of the dataset.
Possibility of data distribution being biased towards specific regions or environments.
Further analysis of the performance of the self-supervised learning algorithm used is needed.
Further research is needed on generalization performance for other hyperspectral sensor data.
👍