Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Preacher: Paper-to-Video Agentic System

Created by
  • Haebom

Author

Jingwei Liu, Ling Yang, Hao Luo, Fan Wang, Hongyan Li, Mengdi Wang

Outline

This paper addresses the "paper-to-video" task of converting research papers into structured video summaries. We highlight the limitations of existing state-of-the-art video generation models, which suffer from limited context windows, fixed video duration constraints, limited style diversity, and an inability to represent domain-specific knowledge. To address these limitations, we present "Preacher," the first paper-to-video agent system. Preacher decomposes, summarizes, and reconstructs papers using a top-down approach, combining various video segments to generate coherent summary videos. We define key scenes to align cross-modal representations and introduce Progressive Chain of Thought (P-CoT) for fine-grained iterative planning. Preacher successfully generates high-quality video summaries across five research areas, demonstrating expertise that surpasses existing video generation models.

Takeaways, Limitations

Takeaways:
Overcoming the limitations of existing models by presenting Preacher, the first paper-to-video agent system.
Generating high-quality video summaries using top-down approaches and P-CoT.
Proven performance through successful applications in various research fields.
Reproducibility and further research are possible through open code.
Limitations:
Generalization performance in fields other than the five research areas presented in the paper requires further verification.
Further analysis is needed on the efficiency and scalability of P-CoT.
The qualitative evaluation of video production has a subjective aspect.
👍