This paper presents a study on the advancement of complex video inference using Large Vision-Language Models (LVLM). To overcome the limitations of existing datasets, we propose a large-scale dataset, ReWatch. ReWatch consists of three components: ReWatch-Caption, ReWatch-QA, and ReWatch-CoT. It generates video-based inference traces using the Multi-Agent ReAct framework. Furthermore, we developed ReWatch-R1 using the Supervised Fine-Tuning (SFT) and RLVR frameworks, and it includes an Observation & Reasoning (O&R) reward mechanism that evaluates the accuracy of the final answer and its consistency with the video content. Experimental results show that ReWatch-R1 achieves state-of-the-art performance on five video inference benchmarks.