[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Physical models realize the transformer architecture of large language models

Created by
  • Haebom

Author

Zeqian Chen

Outline

The introduction of the Transformer architecture in 2017 was one of the most notable developments in the field of natural language processing. Transformers are model architectures that rely solely on attention mechanisms to derive global dependencies between inputs and outputs. However, this paper argues that there is a gap in the theoretical understanding of what a Transformer is and why it works physically. In this paper, we construct a physical model as an open quantum system in the fork space over the Hilbert space of tokens that implements a large-scale language model based on the Transformer architecture from a physical perspective on modern chips. The physical model in this paper is the basis for the Transformer architecture for large-scale language models.

Takeaways, Limitations

Takeaways: Provides new insights into the physical basis of transformer architectures. By explaining the working principles of large-scale language models from the perspective of quantum systems, it opens new research directions.
Limitations: Experimental validation of the proposed physical model is lacking. Additional explanation of its relevance to realistic chip architectures is needed. Discussion of the generalizability and scalability of the model is limited.
👍