This paper introduces Compressed Convolutional Attention (CCA), a novel attention method proposed to reduce the training and serving costs of long context transformers. CCA performs full attention operations within a shared latent space by downprojecting queries, keys, and values, thereby compressing parameters, KV-cache, and FLOPs. Furthermore, we propose Compressed Convolutional Group Query Attention (CCGQA), which combines CCA with head sharing, to further improve computational and bandwidth efficiency. Experimental results demonstrate that CCGQA outperforms GQA and MLA, achieving 8x the KV-cache compression compared to conventional MHA in MoE models without any performance degradation.