This paper theoretically models how the performance of large-scale language models (LLMs) evolves during self-improvement, a technique that improves the performance of LLMs through self-improvement without relying on external data. Specifically, we model the dynamics of self-improvement training using the concept of the solver-verifier gap (the gap between the LLM's solving and verification capabilities) and propose a method for modeling the entire training trajectory based on this dynamic. Experimental results demonstrate the effectiveness of our theoretical framework and analyze the impact of external data on these dynamics. We find that, in environments with limited external data, using external data at any point in time does not significantly impact final performance.