Inspired by the success of large-scale language models (LLMs), this paper presents SpecCLIP, a basic model framework that extends LLM-based methods to stellar spectral analysis. Stellar spectra, like structured languages, contain rich physical and chemical information about stars. SpecCLIP is pre-trained on two spectral types, LAMOST low-resolution and Gaia XP, and performs contrastive learning to connect spectra acquired from different instruments using the CLIP framework. An auxiliary decoder is added to preserve spectrum-specific information and enable conversion between spectral types, maximizing the mutual information between embeddings and input spectra. This builds a cross-spectral framework that is inherently calibrated and flexible across instruments. It is fine-tuned with a medium-sized labeled dataset to improve adaptability for tasks such as stellar parameter estimation and chemical abundance determination, and improves the accuracy and precision of parameter estimation against external survey data. Similarity search and cross-spectral prediction capabilities can also be utilized for anomaly detection. As a result, we demonstrate that the contrastive learning basic model, enhanced with a spectrum-aware decoder, can contribute to the advancement of precision stellar spectroscopy.