6G networks are designed to support on-demand AI model downloads to meet users' diverse inference needs. By pre-caching models on edge nodes, users can retrieve requested models for on-device AI inference with low latency. However, the significant size of current AI models poses significant challenges for edge caching given limited storage capacity, and simultaneously serving heterogeneous models over wireless channels is also challenging. To address these challenges, we propose a fine-grained AI model caching and download system that leverages parameter reusability derived from the common practice of fine-tuning task-specific models using fixed parameters from pre-trained shared models. This system selectively caches model parameter blocks (PBs) at edge nodes, eliminating redundant storage of reusable parameters across different cached models. Furthermore, by incorporating coordinated multipoint (CoMP) broadcasting, we improve downlink spectrum utilization by simultaneously serving reusable PBs to multiple users. In this arrangement, we formulate the problem of minimizing model download latency by jointly optimizing PB caching, migration (between edge nodes), and broadcast beamforming. To address this issue, we develop a distributed multi-agent learning framework that facilitates collaboration by allowing edge nodes to explicitly learn the interplay between their actions. Furthermore, we propose a data augmentation approach that adaptively generates synthetic training samples using a predictive model to increase sample efficiency and accelerate policy learning. Both theoretical analysis and simulation experiments demonstrate the superior convergence performance of the proposed learning framework.