In this paper, we present a full-stack framework that leverages reinforcement learning to extend inference on long-form videos. To this end, we integrate three core components: First, LongVideo-Reason, a large-scale dataset of 104,000 long-form video QA pairs with high-quality inference annotations from various domains (sports, games, vlogs, etc.); Second, a two-stage training pipeline that extends VLM with chain-of-thought supervised learning (CoT-SFT) and reinforcement learning (RL); and Third, MR-SP, a training infrastructure for long-form video RL that integrates sequence-parallel processing and a vLLM-based engine tailored for long-form videos, with cached video embeddings for efficient forwarding and pre-filling. Experimental results show that LongVILA-R1-7B achieves strong performance on video benchmarks, achieving an accuracy of 65.0% without captions and 70.7% with captions on VideoMME, and consistently outperforms LongVILA-R1 on multiple benchmarks. In addition, the performance of LongVILA-R1 steadily improves as the number of input video frames increases. The MR-SP system accelerates long-duration video RL learning by up to 2.1x. Finally, we present a training system for RL learning that supports various modalities (video, text, audio), various models (VILA and Qwen series), and even image and video generation models. It supports RL training on videos up to 1 hour long (e.g., 3,600 frames/about 256,000 tokens) on a single A100 node (8 GPUs).