This paper studies an optimal way to integrate supervised fine-tuning (SFT) and reinforcement learning (RL) to improve the inference ability of large-scale language models (LLMs). From an entropy-based perspective, we comprehensively analyze token distribution, learning dynamics, and integration mechanisms, revealing that SFT induces macroscopic global changes in the LLM policy distribution, while RL performs microscopic selective optimization, and that entropy is an important indicator of training effectiveness. Based on these observations, this paper proposes supervised reinforcement fine-tuning (SRFT), a single-step method that integrates the two fine-tuning paradigms via an entropy-aware weighting mechanism. Instead of a two-step sequential method, SRFT simultaneously applies SFT and RL to directly optimize LLMs using demos and self-exploratory rollouts. Extensive experiments show that SRFT achieves 9.0% performance improvement on five mathematical inference benchmarks and 10.9% on three out-of-distribution benchmarks, and achieves an average accuracy of 59.1%, outperforming Zero-RL methods.