In this paper, we propose SHIELD, a novel collaborative learning method to address the vulnerability of deepfake audio detection. We experimentally show that existing deepfake audio detection methods are vulnerable to anti-forensics (AF) attacks based on generative adversarial networks, and design a collaborative learning framework that integrates a defensive generative model (DF) to defend against them. SHIELD uses a triplet model to capture the correlation between real and AF attack audio, and real and attack audio generated using an auxiliary generative model. It demonstrates strong performance on various generative models on ASVspoof2019, In-the-Wild, and HalfTruth datasets, and effectively mitigates the degradation of detection accuracy caused by AF attacks.