Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

Created by
  • Haebom

Author

Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

Outline

DreamStory presents an open-domain story visualization framework leveraging a Large-Scale Language Model (LLM) and an innovative Multi-Subject Consistency Diffusion Model (MSD). The LLM generates descriptive prompts for topics and scenes relevant to the story and annotates the topics of each scene to support consistent topic generation. MSD uses the detailed topic descriptions generated by the LLM to create topic portraits and utilizes these portraits and their corresponding textual information as multimodal anchors (guides). MSD ensures appearance and semantic consistency with reference images and text, including Masked Mutual Self-Attention (MMSA) and Masked Mutual Cross-Attention (MMCA) modules, and employs a masking mechanism to prevent topic mixing. This study established the DS-500 benchmark for performance evaluation and verified the effectiveness of DreamStory through subjective and objective evaluations.

Takeaways, Limitations

Takeaways:
A new story visualization framework combining LLM and MSD is presented.
Creating effective images that maintain multi-subject consistency
New benchmark DS-500 introduced for evaluating story visualization performance
Validating the effectiveness of DreamStory through subjective and objective evaluations.
Limitations:
Further research is needed on the scale and diversity of the DS-500 benchmark.
Need to improve visualization performance for complex or ambiguous stories
Generalization performance evaluation is needed for diverse real-world stories.
👍