In this paper, we propose CDUPatch, a general-purpose cross-modal adversarial patch attack for VIS-IR dual-modal object detection systems. Existing dual-modal adversarial patch attacks have limited effectiveness in various physical environments. CDUPatch proposes an RGB-to-IR adapter that maps RGB patches to IR patches, enabling the unified optimization of cross-modal patches. We learn an optimal color distribution to adjust the thermal response of adversarial patches, and introduce a multi-scale clipping strategy to build a new VIS-IR dataset, MSDrone, which contains aircraft images of various sizes and viewpoints. Experimental results show that our proposed patch attack outperforms existing patch attacks on four benchmark datasets (DroneVehicle, LLVIP, VisDrone, and MSDrone), and our robust transferability is demonstrated through real-world physical tests across scales, views, and scenarios.