MAViS is a multi-agent collaborative framework designed to support long-form video storytelling by efficiently transforming ideas into visual narratives. It coordinates specialized agents across multiple stages, including script writing, shot design, character modeling, keyframe generation, video animation, and audio generation. At each stage, the agents operate according to the 3E principle (Explore, Review, Enhance). Considering the functional limitations of current generative models, it proposes script writing guidelines to optimize compatibility between scripts and generation tools. MAViS achieves state-of-the-art performance in assistive features, visual quality, and video expressiveness, and its modular framework is extensible to various generative models and tools.