This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper aims to improve the complex reasoning capabilities of large-scale language models (LLMs), particularly their ability to solve mathematical problems. To understand the information propagation mechanism of the Transformer model, we designed a multi-stage inference task and compared and analyzed direct answers and Chain-of-Thought (CoT) inference. We propose a "buffer mechanism" concept, which allows the model to store various pieces of information in separate buffers and selectively extract them as needed. To enhance this, we propose a random matrix-based algorithm with only 132 learnable parameters. The proposed algorithm demonstrates improved performance on seven multi-stage inference datasets, including PrOntoQA, LogicAsker, and LogicInference. This study provides new insights into the internal workings of LLMs.
Takeaways, Limitations
•
Takeaways:
◦
LLM provides a new understanding of the mechanisms of information storage and utilization in the reasoning process.
◦
We demonstrate that the proposed random matrix-based algorithm can effectively improve the inference ability of LLM with fewer parameters.
◦
The generality of the algorithm was confirmed by improving performance on various multi-stage inference datasets.
◦
It presents new directions for improving the design and learning strategies of LLM.
•
Limitations:
◦
The effectiveness of the proposed algorithm may be limited to certain types of multi-stage inference problems.
◦
Further analysis of the specific operating principles of the buffer mechanism is required.
◦
Performance evaluation for more complex and diverse inference tasks is needed.
◦
Further research is needed on the scalability of the algorithm and its applicability to other model architectures.