Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

Created by
  • Haebom

Author

Dawei Xiang, Wenyan Xu, Kexin Chu, Tianqi Ding, Zixu Shen, Yiming Zeng, Jianchang Su, Wei Zhang

Outline

This paper presents PromptSculptor, a proposed framework to address the challenge of requiring users to repeatedly refine detailed prompts to generate high-quality images despite advances in generative AI. PromptSculptor is a multi-agent framework comprised of four specialized agents that automates the process of transforming short, vague user prompts into comprehensive, refined prompts. It leverages Chain-of-Thought inference to infer hidden context and enrich scene and background details, and iteratively refines prompts through self-evaluation agents and feedback-adjustment agents. Experimental results demonstrate that PromptSculptor improves output quality and reduces the number of iterations required to achieve user satisfaction. Its model-independent design enables seamless integration with various T2I models.

Takeaways, Limitations

Takeaways:
Increased ease of use for T2I models: Automate complex prompt engineering processes to minimize user effort.
Improved image generation quality: Enables high-quality image generation through automated prompt optimization.
Model independence: Highly scalable with compatibility with various T2I models.
Industrial applicability: Increases the practicality of the T2I model in various fields.
Limitations:
Lack of detailed description of the interactions and decision-making processes between agents: A more detailed description of how each agent works and how they interact is needed.
Scope and generalizability of the experiment: Additional experiments with different T2I models and user data are needed.
Validation of the performance and reliability of the self-assessment agent: Further analysis of the accuracy and objectivity of the self-assessment agent is required.
Dependence on user feedback: Performance can be significantly impacted by the quality of user feedback.
👍