Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Prompt-aware classifier free guidance for diffusion models

Created by
  • Haebom

Author

Xuanhao Zhang, Chang Li

Outline

Diffusion models have achieved significant advances in image and audio generation through classifier-free guidance, but guidance scale selection has remained understudied. Fixed scales often fail to generalize across prompts of varying complexity and often lead to oversaturation or weak alignment. This paper addresses this gap by introducing a prompt-aware framework to predict scale-dependent quality and select optimal guidance during inference. Specifically, we build a large-scale synthetic dataset by generating samples at multiple scales and scoring them with reliable evaluation metrics. A lightweight predictor conditioned on semantic embeddings and linguistic complexity estimates a multi-metric quality curve, and a utility function is used to determine the optimal scale via regularization. Experimental results on MSCOCO~2014 and AudioCaps demonstrate consistent improvements over vanilla CFG, improving fidelity, alignment, and perceptual desirability. This study demonstrates that prompt-aware scale selection provides effective, training-free enhancements to pretrained diffusion backbones.

Takeaways, Limitations

Takeaways:
Improving the performance of diffusion models through prompt-aware scale selection.
By choosing a guidance scale that generalizes to various prompt complexities, we address oversaturation and weak alignment issues.
Provides training-free improvements over pre-trained diffusion backbones.
Experimentally proven to be applicable to both image generation and audio generation tasks.
Limitations:
More detailed information is needed on the specific framework implementation and the extent of performance improvements.
Lack of information on the complexity and computational cost of lightweight predictors.
Further research is needed to determine generalizability to other diffusion model architectures.
The scalability of the proposed method in real applications needs to be evaluated.
👍