This paper highlights the challenging nature of deepfake content in real-world scenarios due to its complex and evolving nature. Existing academic benchmarks typically feature homogeneous training sources and low-quality test images, significantly discouraging real-world deployment of current detectors. To address this gap, we present the HydraFake dataset, which simulates real-world challenges through hierarchical generalization testing. HydraFake encompasses a variety of deepfake techniques, field forgeries, rigorous training and evaluation protocols, and covers unseen model architectures, novel forgery techniques, and novel data domains. Building on these resources, we propose Veritas, a multimodal large-scale language model (MLLM)-based deepfake detector. Unlike conventional thought processes (CoT), we introduce pattern recognition inference, which incorporates key inference patterns such as "planning" and "self-reflection," to mimic human forensic processes. We also propose a two-stage training pipeline to seamlessly integrate these deepfake inference capabilities into existing MLLMs. Experiments on the HydraFake dataset demonstrate that previous detectors demonstrate excellent generalization performance in cross-model scenarios, but fall short in unseen forgery and data domains. Veritas achieves significant performance improvements across a variety of OOD scenarios, delivering transparent and accurate detection results.