Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

Created by
  • Haebom

Author

Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Yiran Qian, Zhen Dai, Yueyi Luo

Outline

This paper presents a novel framework that addresses the adaptability gap that arises when applying pre-trained vision-language models (VLMs) to zero-shot anomaly detection (ZSAD). VLMs suffer from a lack of local inductive bias for dense prediction and a reliance on an inflexible feature fusion paradigm. To address this, we propose an architectural co-design framework that simultaneously improves feature representation and cross-modal fusion. This framework injects local inductive bias for fine-grained representations via a parameter-efficient Convolutional Low-Dimensionality Adaptation (Conv-LoRA) adapter, and introduces a Dynamic Fusion Gateway (DFG) that adaptively adjusts text prompts using visual context, enabling robust bidirectional fusion. Extensive experiments on various industrial and medical benchmarks demonstrate excellent accuracy and robustness, demonstrating the importance of synergistic co-design for robustly applying the base model to dense perception tasks.

Takeaways, Limitations

Takeaways:
An effective solution to the problem of adaptability differences that arise when applying ZSAD to VLMs.
Parameter-efficient model improvement and performance enhancement via Conv-LoRA and DFG.
Proven accuracy and robustness in a variety of industrial and medical fields.
A novel approach to base model adaptation for dense perception tasks is presented.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Limitations on generalizability due to the use of benchmarks confined to specific industries and healthcare fields.
Further research is needed on optimal hyperparameter settings for Conv-LoRA and DFG.
Further validation of the applicability to other types of anomaly detection problems is needed.
👍