Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Created by
  • Haebom

Author

Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu

Outline

Defending against backdoor attacks in a federated learning (FL) environment with heterogeneous client data distributions presents challenges in balancing effectiveness and privacy. Existing methodologies rely heavily on uniform client data distributions or the availability of clean server datasets. In this paper, we propose CLIP-Fed, a FL backdoor defense framework that leverages the zero-shot learning capabilities of vision-language pre-trained models. CLIP-Fed overcomes the Non-IID limitation on defense effectiveness by integrating pre-aggregation and post-aggregation defense strategies. Using prototype contrastive loss and Kullback-Leibler divergence, CLIP-Fed aligns the global model with CLIP knowledge on an augmented dataset, ensuring class prototype bias due to backdoor samples and removing correlations between trigger patterns and target labels. Furthermore, to balance privacy and coverage expansion across various triggers, we build and augment a server dataset without client samples using a multimodal large-scale language model and frequency analysis. It reduces the average attack success rate (ASR) by 2.03% on CIFAR-10 and 1.35% on CIFAR-10-LT, while improving the average main task accuracy (MTA) by 7.92% and 0.48%, respectively.

Takeaways, Limitations

Takeaways:
Addressing the issue of backdoor attack defense in heterogeneous client data distribution environments.
Leveraging the zero-shot learning capabilities of vision-language dictionary learning models.
Integrating pre-aggregation and post-aggregation defense strategies
Model alignment using prototype contrastive loss and Kullback-Leibler divergence
Building and augmenting server datasets using multimodal large-scale language models and frequency analysis.
Improved performance over existing methods on the CIFAR-10 and CIFAR-10-LT datasets.
Limitations:
Specific Limitations is not directly mentioned in the paper (needs future research)
👍