Multi-stage or hybrid deepfakes are generated by sequentially applying multiple deepfake generation methods, such as face-swapping, GAN-based generation, and diffusion methods, and can pose unexpected technical challenges to detection models trained on single-stage forgeries. In this study, we introduce FakeChain, a large-scale benchmark consisting of one-, two-, and three-stage forgeries. Using this benchmark, we analyze the stages of hybrid manipulation, various generator combinations, and quality settings, and analyze detection performance and spectral characteristics. We find that detection performance is highly dependent on the final manipulation type, with F1 scores decreasing by up to 58.83% when the training distribution deviates. This suggests that the detector relies on artifacts from the final stage rather than the accumulated manipulation trace, limiting generalization.