This paper proposes Native Hybrid Attention (NHA) to address the quadratic complexity problem, a weakness of Transformers, which excel at sequence modeling, while maintaining the efficiency of linear attention and enhancing long-term contextual understanding. NHA presents a novel hybrid architecture that maintains long-term context in key-value slots updated by linear RNNs and integrates intra- and inter-layer hybridization by adding short-term tokens through a sliding window. A single softmax attention operation provides context-dependent weights for each token and head without additional fusion parameters. The sliding window size allows for smooth adjustment between linear and full attention. Experimental results demonstrate that NHA outperforms Transformers and other hybrid-based approaches on recall-intensive and common-sense inference tasks. By structurally incorporating NHA into a pre-trained LLM, we achieve competitive accuracy while improving efficiency.