In this paper, we propose a novel optimization technique, Divergence-driven Zeroth-Order optimization (DiZO), to overcome the limitations of memory-efficient zero-order (ZO) optimization in fine-tuning large-scale language models (LLMs). Existing ZO methods are memory-efficient because they estimate gradients only using a forward pass, but their convergence speed and accuracy are significantly lower than those of first-order (FO) methods. DiZO analyzes the differences in update patterns between FO and ZO optimizations and introduces a layer-wise divergence-driven adaptation method that adjusts the update size according to the optimization needs at each layer. Experimental results show that DiZO significantly reduces the number of iterations required to converge while reducing the training GPU time by up to 48% on various datasets, and outperforms existing ZO techniques in fine-tuning models such as RoBERTa-large, OPT series, and Llama series, and in some cases even surpasses the memory-intensive FO fine-tuning.