This paper presents ToothMCL, a multimodal dictionary learning framework for accurate tooth segmentation in digital dentistry. To overcome the limitations of existing tooth segmentation methods, we utilize multimodal contrastive learning that integrates Cone-Beam Computed Tomography (CBCT) and Intraoral Scans (IOS) data. This learning enables modally invariant representations and accurate modeling of fine anatomical features, enabling precise multiclass segmentation and FDI tooth number identification. Furthermore, we construct a large-scale dataset, CBCT-IOS3.8K, containing data from 3,867 patients. We evaluate ToothMCL on various independent datasets, demonstrating its superior performance over existing methods. We achieve a 12% improvement in Dice Similarity Coefficient (DSC) for CBCT segmentation and an 8% improvement for IOS segmentation.