This paper proposes DFI-OmniStereo, a novel method for omnidirectional depth perception. It aims to generate high-resolution depth maps through low-cost stereo depth estimation based on omnidirectional cameras. To overcome the limitations of existing methods, we utilize a large-scale pre-trained base model to perform relative monocular depth estimation within an iterative optimization-based stereo matching architecture. Specifically, we utilize relative monocular depth features through a two-step training strategy to perform scale-invariant fine-tuning. On the real-world dataset Helvipad, we achieve state-of-the-art results, reducing disparity MAE by approximately 16% compared to the best-performing omnidirectional stereo method.