Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Positional Encoding via Token-Aware Phase Attention

Created by
  • Haebom

Author

Yu Wang, Sheng Shen, R emi Munos, Hongyuan Zhan, Yuandong Tian

Outline

This paper demonstrates that Rotary Positional Embedding (RoPE) suffers from inherent distance-dependent biases that limit its ability to model long-range contexts under practical assumptions. RoPE extension methods can mitigate this problem, but typically require post-training adjustments, such as recalibration or hyperparameter retuning. This paper proposes Token-Aware Phase Attention (TAPA), a novel position encoding method that integrates a learnable phase function into the attention mechanism. TAPA preserves long-range token interactions, scales to longer contexts with direct and lightweight fine-tuning, extrapolates to unseen lengths, and achieves significantly lower confusion in long-range contexts than the RoPE family.

Takeaways, Limitations

Takeaways:
We point out the limitations of RoPE's long-range context modeling capabilities and analyze their causes.
We propose a novel position encoding method, TAPA, which outperforms RoPE in long-distance contexts without post-processing.
TAPA demonstrates scalability to long contexts, ease of fine-tuning, and extrapolation to unknown lengths.
Limitations:
For details on specific experimental setup and performance comparisons, please refer to the paper itself.
Further research may be needed on the practical implementation and application of TAPA.
Potential drawbacks or improvements of TAPA are not explicitly mentioned in the paper.
👍