The introduction of the Transformer architecture in 2017 was one of the most notable developments in the field of natural language processing. Transformers are model architectures that rely solely on attention mechanisms to derive global dependencies between inputs and outputs. However, this paper argues that there is a gap in the theoretical understanding of what a Transformer is and why it works physically. In this paper, we construct a physical model as an open quantum system in the fork space over the Hilbert space of tokens that implements a large-scale language model based on the Transformer architecture from a physical perspective on modern chips. The physical model in this paper is the basis for the Transformer architecture for large-scale language models.