Video academic presentations have become an essential medium for research communication, but even producing short videos of 2 to 10 minutes requires significant time for slide design, recording, and editing. To address the challenge of reconciling multiple aligned channels—research papers, dense multimodal information (text, figures, tables), and slides, subtitles, voice, and human presenters—this paper introduces Paper2Video, the first benchmark that combines 101 research papers with author-generated presentation videos, slides, and presenter metadata. Furthermore, we design four custom evaluation metrics—Meta Similarity, PresentArena, PresentQuiz, and IP Memory—to measure how well the videos convey the information in the paper to the audience. Building on this framework, we propose PaperTalker, the first multi-agent framework for academic presentation video generation that integrates slide generation, effective layout refinement, cursor anchoring, subtitles, speech synthesis, and presenter rendering. Experiments on Paper2Video demonstrate that the proposed approach produces more faithful and informative presentation videos than existing baselines, demonstrating a substantial advance in automated, ready-to-use academic video generation.