This paper reinterprets Behavior Cloning (BC), a traditional supervised learning method, from a Reinforcement Learning (RL) perspective, explaining that it maximizes a lower bound on the RL objective function in a sparse reward environment. We demonstrate that conventional supervised fine-tuning (SFT) can be understood as a method for maximizing this lower bound, and propose that a modification of SFT into importance-weighted supervised fine-tuning (iw-SFT) provides a more accurate approximation of the RL objective function. iw-SFT can outperform SFT and generalize well to data with quality scores. Experimental results demonstrate that iw-SFT is competitive with advanced RL algorithms on large-scale language models and continuous control tasks, achieving a performance of 66.7% on the AIME 2024 dataset.