Text-to-image (T2I) diffusion models excel at generating high-fidelity images, but they suffer from limited diversity due to convergence to similar modes under strong text guidance. This paper proposes a novel method, Contrastive Noise Optimization (CNO), to address this issue. Unlike existing methods that manipulate intermediate latent variables or text conditions, CNO manipulates initial noise to generate diverse outputs. Specifically, it optimizes the placement of noisy latent variables by leveraging the contrastive loss defined in the Tweedie data space. Contrastive optimization maximizes diversity by pushing instances within a batch away from each other, while maintaining fidelity by anchoring them to a reference sample. Experiments on multiple T2I backbones demonstrate that the proposed method performs well on the quality-diversity Pareto frontier and is robust to hyperparameter selection.