Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Masked Image Modeling: A Survey

Created by
  • Haebom

Author

Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, Nicu Sebe

Outline

This paper surveys recent research on mask image modeling (MIM), which has emerged as a powerful self-supervised learning technique in the field of computer vision. The MIM task involves masking information such as pixels, patches, or latent representations, and training a model, usually an autoencoder, to predict the missing information using the context available in the visible part of the input. In this paper, we identify and formalize two approaches to implementing MIM as a semi-supervised learning task (reconstruction-based and contrastive learning-based), and we categorize and review the most prominent papers of the past few years. We supplement the manually constructed classifications with dendrograms obtained by applying hierarchical clustering algorithms, and manually inspect the resulting dendrograms to identify relevant clusters. We also include datasets commonly used in MIM research, and aggregate the performance results of different mask image modeling methods on the most popular datasets to facilitate comparison of competing methods. Finally, we identify research gaps and suggest some interesting directions for future research, and provide a public repository ( https://github.com/vladhondru25/MIM-Survey) containing curated references as supplementary material.

Takeaways, Limitations

Takeaways: Clearly summarizes the two major approaches of MIM (reconstruction-based and contrastive learning-based), and systematically classifies existing studies to facilitate comparative analysis. Provides performance comparison results of various MIM methods to help set research directions. Suggests promising directions for future research. Increases accessibility to research results through open repositories.
Limitations: Since this paper surveys research up to a certain point in time, it may not perfectly reflect the latest research trends. Consideration should be given to securing objectivity in the manual classification and dendrogram interpretation process, which may involve subjective judgment. The diversity of the datasets and evaluation indicators used for performance comparison may be insufficient.
👍