This paper comprehensively analyzes research trends in efficient attention mechanisms to address the quadratic time and memory complexity of self-attention mechanisms in Transformer-based architectures, the core framework of large-scale language models. Specifically, we focus on two major approaches—linear attention and sparse attention—integrating algorithmic innovations and hardware considerations. By analyzing cases where efficient attention mechanisms have been applied to large-scale pre-trained language models, both architectures comprised solely of efficient attention and hybrid designs combining local and global components, we aim to provide a foundation for designing scalable and efficient language models.