Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

LightThinker: Thinking Step-by-Step Compression

Created by
  • Haebom

Author

Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

Outline

This paper proposes LightThinker, a novel method for improving the efficiency of complex inference tasks for large-scale language models (LLMs). Inspired by human cognitive processes, LightThinker transforms intermediate thoughts into compressed representations during inference and discards the original inference process, significantly reducing the number of tokens stored in the context window. This is achieved by learning the timing and method of compression through data organization, mapping hidden states to concise summary tokens, and generating specialized attention masks. Furthermore, we present the Dependency (Dep) metric, which quantifies the degree of compression by measuring the dependence on past tokens during the generation process. Extensive experiments on four datasets and two models demonstrate that LightThinker reduces peak memory usage and inference time while maintaining competitive accuracy. This study presents a novel approach to improving the efficiency of LLMs for complex inference tasks without compromising performance. The code is available at https://github.com/zjunlp/LightThinker .

Takeaways, Limitations

Takeaways:
We present a novel method to effectively reduce the memory usage and inference time of LLM.
Improve efficiency without sacrificing performance through compression techniques inspired by human cognitive processes.
The degree of compression can be quantitatively measured using the Dependency (Dep) indicator.
Ensure reproducibility and extensibility through open code.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Need to evaluate the applicability and performance of LightThinker for various LLMs and tasks.
A more in-depth analysis of the potential for information loss and its impact during the compression process is needed.
👍