Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

Created by
  • Haebom

Author

Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Yiran Qian, Zhen Dai, Yueyi Luo

Outline

This paper presents a novel framework to address the poor adaptability of pre-trained vision-language models (VLMs) when applying them to zero-shot anomaly detection (ZSAD). VLMs suffer from limitations such as a lack of local inductive bias for dense prediction and a reliance on an inflexible feature fusion paradigm. This paper proposes an architectural co-design framework to simultaneously improve feature representation and cross-modal fusion. Specifically, we integrate a parameter-efficient Convolutional Low-Dimensionality Adaptation (Conv-LoRA) adapter to inject local inductive bias for fine-grained representations, and introduce a Dynamic Fusion Gateway (DFG) that adaptively adjusts text prompts using visual context to enable robust bidirectional fusion. Extensive experiments on various industrial and medical benchmarks demonstrate excellent accuracy and robustness, highlighting the importance of this synergistic co-design for robust application of the baseline model to dense perception tasks.

Takeaways, Limitations

Takeaways:
An effective method for improving zero-shot anomaly detection performance of VLMs is presented.
Emphasize the importance of architectural co-design through Conv-LoRA and DFG.
Presenting new possibilities for applying basic models to dense perception tasks.
Confirming applicability in various industrial and medical fields.
Limitations:
Lack of analysis of the computational cost and complexity of the proposed method.
Generalization performance verification is needed for various VLMs.
Further validation in real-world application environments is needed.
Potential performance bias for certain types of anomalies.
👍