In this paper, we propose a Noise-Conditioned Deterministic Policy Optimization (NCDPO) framework to solve the __T7488__ of diffusion policies. Diffusion policies can learn various techniques with strong expressive power, but they can generate suboptimal trajectories or cause serious errors due to the lack and inadequacy of demo data. Existing reinforcement learning-based fine-tuning methods have difficulty in effectively applying PPO to diffusion models due to the computational difficulty of estimating action probabilities during the denoising process. NCDPO treats each denoising step as a differentiable transformation conditioned on pre-sampled noise, enabling estimation and backpropagation through all diffusion steps. Experimental results show that NCDPO outperforms existing methods in both sample efficiency and final performance on various benchmarks (including continuous robot control and multi-agent game scenarios). In particular, it achieves sample efficiency similar to MLP+PPO when learning from randomly initialized policies, and is robust to the number of diffusion steps.