Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

Created by
  • Haebom

Author

Qian Wang, Ziqi Huang, Ruoxi Jia, Paul Debevec, Ning Yu

Outline

MAViS is an end-to-end multi-agent collaborative framework for feature-length video storytelling. It coordinates specialized agents across multiple stages, including script writing, shot design, character modeling, keyframe generation, video animation, and audio generation. At each stage, agents operate according to the 3E principles of Explore, Examine, and Enhance to ensure the quality of the intermediate output. Considering the limitations of current generative models, we present script writing guidelines to optimize compatibility between scripts and generation tools. Experimental results demonstrate that MAViS achieves state-of-the-art performance in assistive features, visual quality, and video expressiveness. Its modular framework enables extensibility with various generative models and tools. With just brief user prompts, it generates high-quality, expressive feature-length video storytelling, enriching users' inspiration and creativity. MAViS is the only framework that provides multimodal design outputs, such as videos with narrative and background music.

Takeaways, Limitations

Takeaways:
It has greatly improved the visual quality and expressiveness of auxiliary functions for creating feature-length videos.
We present a modular framework that provides extensibility with various generative models and tools.
Generate high-quality multi-mode (video, narrative, background music) output with simple prompts.
We propose a way to overcome the limitations of generative models through script writing guidelines.
Limitations:
The paper does not address specific Limitations. Future research may address areas where improvements could be made, such as performance constraints for certain types of storytelling, computational costs, and limited diversity in generated content.
👍