This paper presents a Multi-fidelity Policy Gradients (MFPGs) framework to address the challenges faced by reinforcement learning (RL) algorithms, which struggle with massive data requirements for real-world deployment or costly simulation training. MFPG constructs an unbiased variance-reducing estimator for on-policy policy gradients by blending a small amount of target environment data with control variables from low-quality simulation data. Specifically, it implements a multi-fidelity variant of the classic REINFORCE algorithm, ensuring asymptotic convergence of REINFORCE in the target environment under standard assumptions and achieving faster finite-sample convergence than training using only high-quality data. MFPG is evaluated on a simulation robot benchmark, demonstrating superior performance despite its simplicity and minimal tuning overhead, leveraging both limited high-quality data and abundant low-quality data. Furthermore, MFPG demonstrates effectiveness even in low-quality, unrewarded environments, enhancing sim-to-real transfer efficiency and offering a trade-off between policy performance and data collection cost.