This paper presents AUDETER, a large-scale and diverse deepfake audio dataset, to address the challenges of deepfake audio detection. Existing deepfake detection methods suffer from performance degradation in real-world environments due to discrepancies between training data and real-world data. AUDETER addresses this challenge by incorporating over 3 million audio clips (over 4,500 hours) generated by 11 text-to-speech models and 10 vocoders. Experimental results show that state-of-the-art methods trained on existing datasets struggle to generalize to new deepfake audio samples and exhibit high false positive rates. In contrast, methods trained on AUDETER achieve good detection performance and significantly reduce error rates.