Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TerraMind: Large-Scale Generative Multimodality for Earth Observation

Created by
  • Haebom

Author

Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, Nicolas Long ep e

Outline

TerraMind is the first random-to-random generative multimodal model for Earth observation. Unlike other multimodal models, TerraMind is pretrained on a dual-scale representation that combines token-level and pixel-level data across modes. At the token level, TerraMind encodes high-dimensional contextual information to learn cross-modal relationships, while at the pixel level, it leverages fine-grained representations to capture important spatial nuances. TerraMind is pretrained on nine geospatial modes from large-scale global datasets. This paper demonstrates that (i) TerraMind's dual-scale early fusion approach enables a variety of zero-shot and few-shot applications for Earth observation; (ii) TerraMind introduces a "thinking in modes" (TiM) feature that improves model output by generating additional artificial data during fine-tuning and inference; and (iii) TerraMind achieves state-of-the-art performance on community-standard benchmarks for EO, such as PANGAEA. The pretrained dataset, model weights, and code are open-sourced under a permissive license.

Takeaways, Limitations

Takeaways:
Presenting the first random-to-random generative multimodal model for Earth observation.
Zero-shot and few-shot applications possible with dual-scale initial fusion.
Improving model performance with the "Thinking in Mode" (TiM) feature.
Achieve cutting-edge performance in benchmarks such as PANGAEA
Open source release of models, data, and code
Limitations:
Limitations is not explicitly mentioned in the paper. Further experiments and evaluations may reveal Limitations regarding generalization performance, performance on specific types of geospatial data, computational cost, etc.
👍