Multi-head Latent Attention (MLA), proposed by DeepSeek, is an innovative architecture that compresses the Key-Value (KV) cache into latent vectors, enabling efficient and cost-effective inference. This paper proposes MHA2MLA, the first data-efficient fine-tuning method for transitioning from MHA to MLA. MHA2MLA incorporates partial-RoPE and low-rank approximation, and can recover performance even on small datasets through a joint SVD approximation based on the parameters of a pre-trained model. This reduces inference costs and enables integration with compression techniques such as KV cache quantization. For the Llama2-7B model, we achieve a 92.19% reduction in KV cache size while reducing LongBench performance by only 0.5%.