In this paper, we propose a Diffusion-FSCIL method that uses a pre-trained text-to-image diffusion model as a fixed backbone to solve the FSCIL problem, which suffers from very limited training data. We aim to solve the FSCIL problem by leveraging the advantages of large-scale generative models, such as generative power obtained through large-scale pre-training, multi-scale representation, and flexibility of representation through text encoders. We extract multiple complementary diffusion features to act as latent replay, and slightly utilize feature distillation to prevent generative bias. We achieve efficiency by using a fixed backbone, minimal trainable components, and batch processing of multiple feature extractions. Experimental results on CUB-200, miniImageNet, and CIFAR-100 datasets show that Diffusion-FSCIL outperforms existing state-of-the-art methods and effectively adapts to new classes while maintaining performance on previously learned classes.