Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Occlusion Robustness of CLIP for Military Vehicle Classification

Created by
  • Haebom

Author

Jan Erik van Woerden, Gertjan Burghouts, Lotte Nijskens, Alma M. Liezenga, Sabina van Rooij, Frank Ruis, Hugo J. Kuijf

Outline

This paper studies the robustness of Vision-Language Models (VLMs), such as CLIP, useful in defense applications with limited labeled data. Specifically, to investigate the robustness of CLIP in challenging military environments, such as partial occlusion and low SNR, we evaluated the Normalized Area Under the Curve (NAUC) as a function of occlusion percentage using a custom dataset of 18 military vehicle classes. We found that the Transformer-based CLIP model outperformed CNNs, with fine-grained, distributed occlusions showing greater performance degradation than coarse, continuous occlusions. Furthermore, we observed that the linear probe model degrades rapidly at approximately 35% occlusion, while fine-tuning the backbone reduces performance degradation to 60% or higher occlusions.

Takeaways, Limitations

Takeaways:
Transformer-based CLIP models are shown to be more robust to occlusion than CNNs.
Fine and diffuse occlusions have a greater impact on performance degradation.
Robustness to occlusion can be improved by fine-tuning the backbone.
Emphasize the importance of occlusion-specific enhancement during training.
Limitations:
The study is limited to a specific military vehicle dataset.
Further research is needed on patch-level sensitivity and architectural resilience.
Additional validation required for real-world deployment.
👍