The rapid proliferation of multimodal misinformation on social media has fueled the growing need for research on video misinformation detection, but the lack of large-scale datasets has hampered research. In this paper, we introduce FakeVV, a large-scale benchmark consisting of over 100,000 video-text pairs, and propose Fact-R1, a novel framework that integrates deep inference and collaborative rule-based reinforcement learning. Trained through misinformation CoT instruction tuning, preference alignment via DPO, and GRPO with a verifiable reward function, Fact-R1 exhibits inference behavior similar to advanced text-based reinforcement learning systems in complex multimodal misinformation settings. This study presents a new paradigm for misinformation detection by combining large-scale video understanding, inference-based alignment, and interpretable verification.