This paper presents a novel noise-based learning rule that mimics the mechanisms of biological neural systems, which efficiently learn from delayed rewards, and is applicable even in resource-constrained environments or systems containing non-differentiable components. To address the limitations of traditional reward-regulated hebb learning (RMHL), which involves time delays and hierarchical processing, we propose an algorithm that uses the reward prediction error as an optimization objective and incorporates an eligibility trace to enable retrospective credit assignment. This method utilizes only local information and experimentally demonstrates that it outperforms RMHL and achieves performance comparable to backpropagation (BP) in reinforcement learning tasks (both immediate and delayed rewards). Although its convergence speed is slow, it demonstrates applicability to low-power adaptive systems where energy efficiency and biological plausibility are crucial. Furthermore, it provides insight into the mechanisms by which dopamine-like signals and synaptic stochasticity contribute to learning in biological networks.