Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Dataset Condensation with Color Compensation

Created by
  • Haebom

Author

Huyu Wu, Duo Su, Junjie Hou, Guang Li

Outline

This paper emphasizes the importance of color in dataset condensation and proposes DC3 (Dataset Condensation with Color Compensation), a novel method that overcomes the shortcomings of existing methods. We point out that existing image-level selection methods (Coreset Selection and Dataset Quantization) lead to inefficient reduction, and pixel-level optimization methods (Dataset Distillation) cause semantic distortion due to excessive parameters. DC3 utilizes a latent diffusion model to enhance the color diversity of existing images instead of generating new images after a compensated selection strategy. We demonstrate that DC3 is the first study to fine-tune a pre-trained diffusion model on a reduced dataset, outperforming existing state-of-the-art (SOTA) methods in various benchmarks and generalizing well. Our FID results demonstrate that network training on a high-quality dataset is possible without model collapse or performance degradation.

Takeaways, Limitations

Takeaways:
We highlighted the importance of color in the dataset reduction process and improved it to achieve improved performance.
We present a novel method to effectively enhance the color diversity of images by utilizing the latent diffusion model.
We demonstrate that fine-tuning a pre-trained diffusion model with a reduced dataset is possible.
Achieved SOTA performance across various benchmarks.
Limitations:
Further analysis may be required to evaluate the efficiency and scalability of the proposed method.
Further research may be needed to determine whether this is applicable only to specific types of datasets or to datasets in general.
Consideration may need to be given to the computational cost and complexity of utilizing latent diffusion models.
👍