This paper proposes Dynamic Mask Attention (DMA), a learnable dynamic mask sparse attention mechanism, to address the quadratic complexity problem of standard self-attention mechanisms, which poses a limitation due to the increasing demand for long-text modeling. DMA leverages content-aware and position-aware sparsity to reduce computational complexity while minimizing information loss. Content-aware sparse masks are dynamically generated from value representations to focus on important information, while position-aware sparse attention skips unnecessary computational regions. Experimental results demonstrate that DMA outperforms various attention mechanisms (multi-head attention, sliding window attention, multi-head latent attention, and conventional sparse attention) in terms of perplexity under the Chinchilla Scaling Law setting, and demonstrates superior performance and efficiency in multi-query associative recall tasks. Notably, in a 1.7 billion-parameter model evaluation, DMA outperforms multi-head attention on both standard benchmarks and the needle-in-a-haystack task.