To address the lack of annotated data and poor interpretability, key challenges limiting the adoption of deep learning-based solutions in medical image analysis, this paper proposes the Concept Bottleneck Vision-Language Model (CBVLM), which leverages a large-scale Vision-Language Model (LVLM). CBVLM identifies the presence or absence of concepts in images through LVLM and classifies them based on this information. Furthermore, it integrates a retrieval module that selects optimal examples for context learning, reducing annotation costs and enhancing interpretability. Extensive experiments on four medical datasets and twelve LVLMs demonstrate that CBVLM outperforms existing methods.