Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Preacher: Paper-to-Video Agentic System

Created by
  • Haebom

Author

Jingwei Liu, Ling Yang, Hao Luo, Fan Wang, Hongyan Li, Mengdi Wang

Outline

This paper addresses the "paper-to-video" task of converting research papers into structured video abstracts. To address the limitations of existing state-of-the-art video generation models (limited context window, fixed video duration constraints, limited style diversity, and inability to represent domain-specific knowledge), we propose Preacher, the first "paper-to-video" agent system. Preacher decomposes, summarizes, and reconstructs papers using a top-down approach, and synthesizes diverse video segments into coherent abstracts using bottom-up video generation. We define key scenes to align cross-modal representations and introduce Progressive Chain of Thought (P-CoT) for fine-grained iterative planning. We successfully generate high-quality video abstracts across five research areas, demonstrating expertise that surpasses existing video generation models. The code will be made available at https://github.com/GenVerse/Paper2Video .

Takeaways, Limitations

Takeaways:
We propose a novel agent system, Preacher, that overcomes the limitations of existing video generation models, such as limited context windows, fixed video duration, and limited style diversity.
Effectively convert the main content of a paper into a video by combining top-down and bottom-up approaches.
Align cross-modal representations and perform granular planning using Progressive Chain of Thought (P-CoT).
Success in generating high-quality video abstracts across a variety of research fields.
Ensuring reproducibility and expandability of research through open source code disclosure.
Limitations:
Possible lack of specific metrics and analytics to evaluate the performance of the Preacher system.
Further validation of generalization performance across various research fields is needed.
Applicability and performance limitations may exist for papers with extremely complex or specialized terminology.
Possible lack of analysis of errors and biases that may occur during the video creation process
👍