This paper proposes CCL-LGS, a novel framework for 3D semantic understanding. Addressing the challenges faced by existing 2D prior-based methods, which suffer from cross-view semantic inconsistencies due to occlusion, image blur, and view-dependent variations, we propose a method that enhances view-consistent semantic supervision by incorporating multi-view semantic cues. Specifically, we align SAM-generated 2D masks using a zero-shot tracker, extract robust semantic encodings using CLIP, and extract discriminative semantic features by enhancing intra-class compactness and inter-class distinctiveness through the Contrastive Codebook Learning (CCL) module. Unlike existing methods, CCL-LGS explicitly resolves semantic conflicts while maintaining category discriminability, rather than directly applying CLIP to incomplete masks. Experimental results demonstrate that CCL-LGS outperforms existing state-of-the-art methods.