[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

Created by
  • Haebom

Author

Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama

Outline

In this paper, we propose a dataset distillation technique using a generative model to alleviate the dependency on large datasets. Unlike existing methods that focus on the consistency with the original dataset, this paper proposes a task-specific sampling strategy to improve the performance of specific downstream tasks such as classification tasks. This is a method that generates a dataset by obtaining a sampling distribution that matches the difficulty distribution of the original dataset from the image pool, and applies a log transformation as a preprocessing step to correct the distribution bias. Through extensive experiments, we verify the effectiveness of the proposed method and suggest its applicability to other downstream tasks. The code is available on GitHub.

Takeaways, Limitations

Takeaways:
Suggests the possibility of improving the performance of downstream tasks through task-specific sampling strategies.
A new perspective (taking difficulty into account) in the field of generative model-based dataset distillation.
Confirming the effect of correcting distribution bias through log transformation.
Ensuring reproducibility and extensibility through disclosure of the code of the proposed method.
Limitations:
Currently, we focus only on classification tasks. Generalizability to other downstream tasks requires further study.
Limitations of the proposed difficulty measurement method and possible room for improvement.
Further experiments are needed to determine whether performance improvements for specific datasets and tasks generalize to other datasets and tasks.
👍