This study explores a pan-sharpening technique that integrates high-resolution panchromatic (PAN) images with low-resolution multispectral (MS) images to generate high-quality fused images. To overcome the limitation of spatially constant convolution operations in existing CNN-based pan-sharpening techniques, we propose a novel architecture, RAPNet, which uses receptive field adaptive convolution (RAPConv) that adaptively changes according to local features. RAPNet integrates RAPConv with an attention-driven pan-sharpening dynamic feature fusion (PAN-DFF) module to optimally balance spatial resolution and spectral accuracy. Experimental results using public datasets demonstrate that RAPNet outperforms existing methods in both quantitative and qualitative evaluations, and additional ablation studies demonstrate the effectiveness of the proposed adaptive component.