This paper proposes CSI-BERT2, an integrated framework for predicting and classifying channel state information (CSI), which plays a crucial role in wireless communication and sensing systems. Building on CSI-BERT, it captures complex relationships between CSI sequences through a bidirectional self-attention mechanism. Specifically, to address data insufficiency and packet loss, we introduce a two-stage training approach utilizing a Mask Language Model (MLM). Pre-training is performed for general feature extraction, followed by fine-tuning for specific tasks. For CSI prediction tasks, we extend the MLM to a Mask Prediction Model (MPM), introduce an Adaptive Reweighting Layer (ARL) to enhance subcarrier representation, and add an MLP-based temporal embedding module to address temporal information loss. Experimental results using real and simulated datasets demonstrate that CSI-BERT2 achieves state-of-the-art performance across all tasks and operates effectively even with discontinuous CSI sequences resulting from varying sampling rates and packet loss.