This paper proposes THFlow, a novel flow-matching-based multimodal generative model for de novo 3D peptide design. Existing deep generative models attempt to converge on a target pocket by simultaneously modeling the peptide's position, orientation, and structure. However, in the early stages, peptides are initialized far from the protein pocket, and the absence of interaction fields makes structure optimization alone physically ineffective. We define this as the multimodal temporal mismatch problem and argue that it is the primary cause of the low binding affinity of generated peptides. THFlow addresses this problem by explicitly modeling the temporal hierarchy between peptide position and structure. It uses polynomial-based conditional flow to accelerate initial position convergence and subsequently performs coordinated structural refinement with rotation and torsion under emerging interaction fields. Furthermore, it incorporates interaction-related features, such as polarity, to enhance the model's understanding of peptide-protein binding. Experimental results demonstrate that THFlow outperforms existing methods in generating peptides with superior stability, affinity, and diversity.