This paper proposes a novel 3D style transfer pipeline that leverages knowledge from pre-trained 2D diffusion models to address the challenges of existing 3D style transfer methods, which struggle to effectively extract and transfer high-dimensional style semantics and suffer from structural ambiguity in the resulting style application, making object identification difficult. This pipeline consists of two steps: generating stylized renderings of dominant viewpoints and then transferring them to 3D representations. Specifically, cross-view style alignment enables feature interactions across multiple dominant viewpoints, and instance-level style transfer effectively transfers consistency between stylized dominant viewpoints to the 3D representation, resulting in structurally and visually consistent stylization results. Experimental results on various scenes demonstrate that the proposed method outperforms existing state-of-the-art methods.