This paper studies the robustness of Vision-Language Models (VLMs), such as CLIP, useful in defense applications with limited labeled data. Specifically, to investigate the robustness of CLIP in challenging military environments, such as partial occlusion and low SNR, we evaluated the Normalized Area Under the Curve (NAUC) as a function of occlusion percentage using a custom dataset of 18 military vehicle classes. We found that the Transformer-based CLIP model outperformed CNNs, with fine-grained, distributed occlusions showing greater performance degradation than coarse, continuous occlusions. Furthermore, we observed that the linear probe model degrades rapidly at approximately 35% occlusion, while fine-tuning the backbone reduces performance degradation to 60% or higher occlusions.