[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

BriLLM: Brain-inspired Large Language Model

Created by
  • Haebom

Author

Hai Zhao, Hongqiu Wu, Dongjie Yang, Anni Zou, Jiale Hong

Outline

This paper presents for the first time a large-scale brain-inspired language model (BriLLM), which is different from existing methods such as Transformer or GPT. BriLLM is a neural network based on the definition of signal fully connected flow (SiFu) on a directed graph, and unlike existing models that are limited to input and output, it provides interpretability for all nodes in the entire graph of the model. Tokens are defined as nodes in the graph, and signals flow between nodes according to the principle of “least resistance.” The next token is the target of the signal flow, and since the model size is independent of the input and prediction length, it theoretically supports infinitely long n-gram models. The signal flow provides re-activation and multi-modal support similar to the cognitive patterns of the human brain. Currently, the Chinese version of BriLLM (4000 tokens, 32-dimensional node width, 16-token-long sequence prediction) has been released, and it shows similar performance to GPT-1.

Takeaways, Limitations

Takeaways:
Presenting a large-scale language model with a new architecture different from existing Transformer-based models
Provides interpretability of the entire model
Theoretically possible to support infinitely long n-gram models
Suggests re-activation and multi-modal support possibilities similar to cognitive patterns in the human brain
Achieving GPT-1 level performance with relatively few resources
Limitations:
The currently published model is small in size (4000 tokens) and has limited ability to process long sequences (16 tokens).
More computational power is required, and verification of performance and efficiency when scaled to large-scale models is required.
There is still a lack of support for other languages, including English.
Although it claims to be inspired by the brain, further explanation is needed as to its exact correspondence to how the actual brain works.
👍