FrameMind is an end-to-end reinforcement learning framework based on the Frame-Interactive Chain of Thought (FiCOT), developed to overcome the limitations of existing video understanding models that rely on fixed frame sampling strategies. It alternates between text inference and active visual recognition, utilizing tools to extract specific frames or video clips based on knowledge gaps. The dynamic sampling policy, learned through Dynamic Resolution Frame Sampling (DRFS) and DRFS-GRPO, learns various spatiotemporal trade-offs and learns from outcome-based rewards without frame-level annotations. It has demonstrated superior performance compared to existing models on benchmarks such as MLVU and VideoMME.