This paper proposes Context-Adaptive Multi-Prompt Embedding, a novel method for enriching semantic representations in visual-language contrastive learning. Unlike existing CLIP-style models that rely on a single text embedding, this study introduces multiple structured prompts, each containing unique adaptive tokens that capture different semantic aspects of the input text. Within the CLIP framework, we leverage a pre-trained LLM as a text encoder to jointly process all prompts in a single pass. The resulting prompt embeddings are combined into a unified text representation, enabling richer semantic alignment with visual features. To further enhance semantic diversity and representational quality, we incorporate diversity regularization losses and negation recognition losses to encourage specialization among prompts and improve contrastive discrimination. Our method achieves consistent performance gains on image-to-text and video-to-text retrieval benchmarks.