This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
This paper presents PromptSculptor, a proposed framework to address the challenge of requiring users to repeatedly refine detailed prompts to generate high-quality images despite advances in generative AI. PromptSculptor is a multi-agent framework comprised of four specialized agents that automates the process of transforming short, vague user prompts into comprehensive, refined prompts. It leverages Chain-of-Thought inference to infer hidden context and enrich scene and background details, and iteratively refines prompts through self-evaluation agents and feedback-adjustment agents. Experimental results demonstrate that PromptSculptor improves output quality and reduces the number of iterations required to achieve user satisfaction. Its model-independent design enables seamless integration with various T2I models.
Takeaways, Limitations
•
Takeaways:
◦
Increased ease of use for T2I models: Automate complex prompt engineering processes to minimize user effort.
Model independence: Highly scalable with compatibility with various T2I models.
◦
Industrial applicability: Increases the practicality of the T2I model in various fields.
•
Limitations:
◦
Lack of detailed description of the interactions and decision-making processes between agents: A more detailed description of how each agent works and how they interact is needed.
◦
Scope and generalizability of the experiment: Additional experiments with different T2I models and user data are needed.
◦
Validation of the performance and reliability of the self-assessment agent: Further analysis of the accuracy and objectivity of the self-assessment agent is required.
◦
Dependence on user feedback: Performance can be significantly impacted by the quality of user feedback.