Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

A Comprehensive Survey on Data Augmentation

Created by
  • Haebom

Author

Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, Yuanchun Zhou

Outline

This paper explores data augmentation, a set of techniques for generating high-quality artificial data by manipulating existing data samples. It contributes to improving the applicability of AI models and significantly enhancing their generalization capabilities in tasks involving sparse or imbalanced data sets. Unlike previous studies that focus solely on data from a specific modality, this paper consistently summarizes data augmentation techniques across multiple modalities and focuses on understanding how existing data samples contribute to the data augmentation process. This paper proposes a novel taxonomy encompassing data augmentation techniques across various common data modalities and explores methods that leverage inter-instance and intrinsic relationships. Furthermore, it categorizes data augmentation methods across five data modalities using a unified inductive approach.

Takeaways, Limitations

Provides a comprehensive overview of data augmentation techniques across various data modalities.
Exploring data augmentation methods that leverage inter-instance and intrinsic relationships.
Systematizing data augmentation techniques through a unified classification method.
Improving understanding of data augmentation in general by improving upon the limitations of existing research on specific modalities.
Possible lack of detailed information on the paper's specific methodology, experimental results, and practical application cases
Further validation of the proposed classification method's effectiveness and generalization ability is needed.
Limited applicability and extensibility to modalities other than the five modalities.
👍