Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

Created by
  • Haebom

Author

Chang Dai, Hongyu Shan, Mingyang Song, Di Liang

Outline

This paper introduces Hyperbolic Rotary Positional Encoding (HoPE), a proposed approach to address the limitations of positional encoding mechanisms used to model sequential structure and long-range dependencies in Transformer models. Existing absolute positional encodings struggle with extrapolation to long sequences due to their fixed positional representations. Relative approaches, such as Alibi, exhibit poor performance in very long contexts. The widely used Rotary Positional Encoding (RoPE) struggles to model long-range dependencies reliably due to its oscillating attention patterns. HoPE, inspired by the Lorenz transform in hyperbolic geometry, addresses these issues by applying Lorenz rotations to token representations using hyperbolic functions. Theoretical analysis demonstrates that RoPE is a special case of a generalized formulation of HoPE, fundamentally resolving the oscillation problem of RoPE by enforcing a monotonic decrease in attention weights as the inter-token distance increases. Extensive experimental results, including perplexity evaluations on several extended sequence benchmarks, demonstrate that HoPE consistently outperforms existing positional encoding methods. These results highlight HoPE's enhanced ability to represent and generalize long-range dependencies. The data and code will be made public.

Takeaways, Limitations

Takeaways:
We present a new position encoding technique, HoPE, that overcomes the limitations of existing position encoding methods (RoPE, Alibi, etc.).
Reliable long-range dependency modeling even in long sequences
Solving RoPE vibration problems and improving performance
Presentation of theoretical basis based on hyperbolic geometry
Demonstrated superior performance compared to existing methods in various benchmarks
Limitations:
The information released to date is insufficient to provide detailed information on actual implementation and application.
Further research is needed to determine generalizability to other types of sequence data or tasks.
Additional performance evaluation for extremely long sequences is needed.
Need to analyze computational costs and memory usage
👍