Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

Created by
  • Haebom

Author

Chang Dai, Hongyu Shan, Mingyang Song, Di Liang

Outline

This paper proposes Hyperbolic Rotary Positional Encoding (HoPE), a novel positional encoding method inspired by the Lorenz transform of hyperbolic geometry, to address the limitations of positional encoding mechanisms used to model sequence structure and long-range dependencies in Transformer models. While conventional Rotary Positional Encoding (RoPE) hinders the modeling of long-range dependencies due to oscillating attention patterns, HoPE overcomes this problem by applying Lorenz rotations to token representations using hyperbolic functions. Theoretical analysis demonstrates that RoPE is a special case of a generalized formulation of HoPE, and HoPE fundamentally addresses the problem of RoPE by enforcing a monotonic decrease in attention weights as the inter-token distance increases. Experimental results using various extended sequence benchmarks demonstrate that HoPE outperforms existing positional encoding methods.

Takeaways, Limitations

Takeaways:
We present a novel position encoding method, HoPE, which solves the oscillating attention pattern problem of RoPE, which is Limitations.
Generalization of RoPE based on theoretical basis of hyperbolic geometry
Improved performance in modeling long-range dependencies and improved extrapolation performance for long sequences.
Demonstrated superior performance compared to existing methods in various benchmarks
Limitations:
Further research is needed to determine whether the effectiveness of the proposed method can be generalized to all types of sequence data and Transformer models.
Details of the experimental results and code have not yet been released (will be released later)
👍