Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

Created by
  • Haebom

Author

Jisung Hwang, Jaihoon Kim, Minhyuk Sung

Outline

This paper proposes a novel regularization loss that guides samples to conform to a standard Gaussian distribution to facilitate various subsequent tasks, including optimization in the latent space of text-to-image models. We treat the elements of high-dimensional samples as one-dimensional standard Gaussian variables in the spatial domain and define a composite loss that combines moment-based regularization in the spatial domain and power-spectrum-based regularization in the spectral domain. Because the expected values of the moment and power-spectrum distributions are analytically known, this loss facilitates consistency with these properties. To ensure permutation invariance, the loss is applied to randomly permuted inputs. Notably, existing Gaussian-based regularizations are integrated within our unified framework. While some correspond to moment losses of a certain order, previous covariance matching losses are equivalent to our spectral loss but incur higher time complexity due to spatial-domain computation. In this paper, we demonstrate the application of our regularization in generative modeling for test-time compensation alignment using text-to-image models, focusing specifically on improving aesthetics and text alignment. The proposed regularization outperforms existing Gaussian regularization, effectively preventing compensation hacking, and speeding up convergence.

Takeaways, Limitations

Takeaways:
A novel regularization loss proposal that induces sample alignment for the standard Gaussian distribution.
We present an integrated framework that combines moment-based normalization in the spatial domain and power spectrum-based normalization in the spectral domain.
Demonstrated improved performance, compensation hacking prevention, and convergence speed compared to conventional Gaussian regularization.
We present the applicability of text-to-image models to improve aesthetics and text alignment.
Limitations:
Further research is needed on the generalization performance of the proposed regularization loss.
Extensive experimental validation is needed for various text-to-image models and downstream tasks.
Potential increase in computational costs due to high-dimensional data processing.
👍