This paper proposes a method to improve long-term inference capabilities by leveraging existing short-term inference data to address the problem of large-scale language models' degraded performance on long-term inference tasks. Specifically, we synthesize simple problems to generate complex multi-level dependency chains of arbitrary length, train the model using rewards that only contain outcomes, and apply a curriculum that automatically increases complexity to ensure scalability of reinforcement learning (RL) training. Using this method, a model trained on synthetic sixth-grade math problems (GSM8K) demonstrates up to a 2.06x accuracy improvement on longer, competitive benchmarks (GSM-Symbolic, MATH-500, and AIME), and generalizes to various ReasoningGym domains and long-term context benchmarks.