Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Native Hybrid Attention for Efficient Sequence Modeling

Created by
  • Haebom

Author

Jusen Du, Jiaxi Hu, Tao Zhang, Weigao Sun, Yu Cheng

Outline

This paper proposes Native Hybrid Attention (NHA) to address the quadratic complexity problem, a weakness of Transformers, which excel at sequence modeling, while maintaining the efficiency of linear attention and enhancing long-term contextual understanding. NHA presents a novel hybrid architecture that maintains long-term context in key-value slots updated by linear RNNs and integrates intra- and inter-layer hybridization by adding short-term tokens through a sliding window. A single softmax attention operation provides context-dependent weights for each token and head without additional fusion parameters. The sliding window size allows for smooth adjustment between linear and full attention. Experimental results demonstrate that NHA outperforms Transformers and other hybrid-based approaches on recall-intensive and common-sense inference tasks. By structurally incorporating NHA into a pre-trained LLM, we achieve competitive accuracy while improving efficiency.

Takeaways, Limitations

Takeaways:
NHA presents a novel hybrid attention architecture that improves efficiency while maintaining long-term contextual awareness.
It showed improved performance compared to existing models in recall-critical tasks and common-sense reasoning tasks.
Integration with pre-trained LLMs has shown promise in improving both efficiency and accuracy.
Sliding window size allows flexible adjustment between linear and full attention.
Limitations:
Although there is no direct mention of the specific Limitations in the paper, the performance and efficiency of NHA need to be further verified on various tasks and datasets.
Research is needed on optimization strategies according to sliding window size adjustment.
Further research on the scalability of NHA is needed.
👍