This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Yan is a foundational framework for interactive video generation that encompasses the entire pipeline of simulation, generation, and editing. It consists of three core modules. For AAA-level simulations, we designed a high-compression, low-latency 3D-VAE and a KV-cache-based shift-window denoising inference process to achieve real-time 1080P/60FPS interactive simulation. For multimodal generation, we introduce a hierarchical autoregressive captioning method that infuses game-specific knowledge into an open-domain multimodal video diffusion model (VDM) and then transforms the VDM into a frame-by-frame, action-controlled, real-time, infinitely interactive video generator. Even when text and visual prompts originate from different domains, the model demonstrates strong generalization and allows for flexible mixing and composing of cross-domain styles and mechanisms based on user prompts. For multi-granularity editing, we propose a hybrid model that explicitly separates interactive mechanism simulation and visual rendering, enabling multi-granularity video content editing during text-based interaction. By integrating these modules, Yan evolves interactive video generation beyond an isolated function into a comprehensive AI-driven interactive creation paradigm, paving the way for the next generation of creative tools, media, and entertainment.
Takeaways, Limitations
•
Takeaways:
◦
Real-time 1080P/60FPS AAA-quality interactive video simulation.
◦
Generating multi-modal (text, image) interactive videos using game-specific knowledge.
◦
Flexible mixing and compositing of cross-domain styles and mechanisms.
◦
Provides text-based multi-grain video content editing capabilities.
◦
Presenting an AI-based interactive content creation paradigm.
•
Limitations:
◦
Based on the information available to date, it is difficult to determine the specific Limitations. Further research is needed to uncover the model's performance limitations, computational resource requirements, and generalization limitations.