This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang
Outline
This paper proposes LightThinker, a novel method for improving the efficiency of complex inference tasks for large-scale language models (LLMs). Inspired by human cognitive processes, LightThinker transforms intermediate thoughts into compressed representations during inference and discards the original inference process, significantly reducing the number of tokens stored in the context window. This is achieved by learning the timing and method of compression through data organization, mapping hidden states to concise summary tokens, and generating specialized attention masks. Furthermore, we present the Dependency (Dep) metric, which quantifies the degree of compression by measuring the dependence on past tokens during the generation process. Extensive experiments on four datasets and two models demonstrate that LightThinker reduces peak memory usage and inference time while maintaining competitive accuracy. This study presents a novel approach to improving the efficiency of LLMs for complex inference tasks without compromising performance. The code is available at https://github.com/zjunlp/LightThinker .