Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

Created by
  • Haebom

Author

Qian Wang, Ziqi Huang, Ruoxi Jia, Paul Debevec, Ning Yu

Outline

MAViS is an end-to-end multi-agent collaborative framework for feature-length video storytelling. It coordinates specialized agents across multiple stages, including script writing, shot design, character modeling, keyframe generation, video animation, and audio generation. At each stage, the agents operate according to the 3E principle (Explore, Review, Enhance) to ensure the completeness of the intermediate output. Considering the functional limitations of current generative models, we propose script writing guidelines to optimize compatibility between scripts and generation tools. Experimental results demonstrate that MAViS achieves state-of-the-art performance in assistive features, visual quality, and video expressiveness. The modular framework further enhances extensibility with various generative models and tools. With simple user prompts, MAViS generates high-quality, expressive feature-length video storytelling, enriching users' inspiration and creativity. MAViS is the only framework that provides multimodal design outputs, such as videos with narrative and background music.

Takeaways, Limitations

Takeaways:
Significantly improved auxiliary functions, visual quality, and expressiveness in feature-length video creation.
We present an efficient generation process through multi-agent collaboration and the 3E principle.
A modular framework ensures extensibility with various generative models and tools.
Provides high-quality multi-mode (video, narrative, background music) output with simple prompts.
It contributes to promoting users' creativity and inspiration.
Limitations:
There are aspects that depend on the functional limitations of the current generative model (suggesting the need for script writing guidelines).
There may be a lack of detailed descriptions of the types and capabilities of specific generative models and tools.
Further research may be needed to explore generalizability to video generation across different genres and styles.
👍