Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Paper2Video: Automatic Video Generation from Scientific Papers

Created by
  • Haebom

Author

Zeyu Zhu, Kevin Qinghong Lin, Mike Zheng Shou

Outline

Video academic presentations have become an essential medium for research communication, but even producing short videos of 2 to 10 minutes requires significant time for slide design, recording, and editing. To address the challenge of reconciling multiple aligned channels—research papers, dense multimodal information (text, figures, tables), and slides, subtitles, voice, and human presenters—this paper introduces Paper2Video, the first benchmark that combines 101 research papers with author-generated presentation videos, slides, and presenter metadata. Furthermore, we design four custom evaluation metrics—Meta Similarity, PresentArena, PresentQuiz, and IP Memory—to measure how well the videos convey the information in the paper to the audience. Building on this framework, we propose PaperTalker, the first multi-agent framework for academic presentation video generation that integrates slide generation, effective layout refinement, cursor anchoring, subtitles, speech synthesis, and presenter rendering. Experiments on Paper2Video demonstrate that the proposed approach produces more faithful and informative presentation videos than existing baselines, demonstrating a substantial advance in automated, ready-to-use academic video generation.

Takeaways, Limitations

Takeaways:
Presenting new benchmarks and evaluation metrics for automating academic presentation video production.
Improving the efficiency and quality of academic presentation video generation through a multi-agent framework called PaperTalker.
Presenting practical steps toward the advancement of automated academic video generation technology.
Limitations:
There is no specific mention of Limitations in the paper.
👍