This paper proposes a novel approach to overcome the limitations of the existing two-stage pipeline for improving the inference performance of large-scale language models (LLMs): supervised learning fine-tuning (SFT) and reinforcement learning (RL). This approach views SFT and RL as complementary reward signals. To address the drawbacks of existing methods, such as catastrophic forgetting and the suboptimal trade-off between imitation and exploration, we propose Adaptive Meta-Fine-Tuning (AMFT), a single-stage algorithm that learns the optimal balance between the path-level rewards of SFT and the outcome-based rewards of RL by introducing the concept of implicit rewards. At the core of AMFT is a meta-gradient adaptive weight controller that dynamically optimizes the SFT-RL balance as a learnable parameter to maximize long-term task performance. It autonomously discovers effective learning processes by ensuring stability using policy entropy. AMFT achieves state-of-the-art performance on a variety of benchmarks, including mathematical reasoning, abstract visual reasoning (General Points), and visual-language exploration (V-IRL), and demonstrates excellent generalization performance on out-of-distribution (OOD) tasks. Through ablation studies and learning dynamic analysis, we demonstrate that meta-learning controllers play a crucial role in the stability, sample efficiency, and performance of AMFT.