Inspired by the Well-initialized Lottery Ticket Hypothesis (WLTH), this paper proposes Soft-Transformers (Soft-TF), a novel fully fine-tuned continuous learning (CL) method that sequentially trains and selects optimal soft networks for each task. Soft-TF maintains the parameters of pre-trained layers fixed during continuous learning, while optimizing the weights of sparse layers using well-initialized Soft-TF masks to obtain task-adaptive soft (real-valued) networks. During inference, the identified task-adaptive network masks the parameters of the pre-trained network to map it to the optimal solution for each task, minimizing catastrophic forgetting (CF). Soft-masking preserves the knowledge of the pre-trained network. Extensive experiments on the Vision Transformer (ViT) and Language Transformer (Bert) demonstrate the effectiveness of Soft-TF, achieving state-of-the-art performance in vision and language class incremental learning (CIL) scenarios.