Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Diverse Text-to-Image Generation via Contrastive Noise Optimization

Created by
  • Haebom

Author

Byungjun Kim, Soobin Um, Jong Chul Ye

Outline

Text-to-image (T2I) diffusion models excel at generating high-fidelity images, but they suffer from limited diversity due to convergence to similar modes under strong text guidance. This paper proposes a novel method, Contrastive Noise Optimization (CNO), to address this issue. Unlike existing methods that manipulate intermediate latent variables or text conditions, CNO manipulates initial noise to generate diverse outputs. Specifically, it optimizes the placement of noisy latent variables by leveraging the contrastive loss defined in the Tweedie data space. Contrastive optimization maximizes diversity by pushing instances within a batch away from each other, while maintaining fidelity by anchoring them to a reference sample. Experiments on multiple T2I backbones demonstrate that the proposed method performs well on the quality-diversity Pareto frontier and is robust to hyperparameter selection.

Takeaways, Limitations

Takeaways:
Addressing the problem of image generation diversity from a new perspective on manipulating initial noise.
Effective diversity achieved by utilizing contrastive loss in Tweedie data space.
Demonstrated excellent quality-diversity performance through experiments on multiple T2I backbones.
Robustness to hyperparameters for increased ease of use.
Limitations:
The specific Limitations is not specified in the abstract (it cannot be determined from the description of the paper alone).
👍