Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Lossless Token Sequence Compression via Meta-Tokens

Created by
  • Haebom

Author

John Harvill, Ziwei Fan, Hao Wang, Luke Huan, Anoop Deoras, Yizhou Sun, Hao Ding

Outline

Unlike previous research on prompt compression for large-scale language models (LLMs), which primarily focuses on methods that sacrifice semantic information, this paper presents a task-independent, lossless compression technique similar to LZ77. On two evaluation tasks, we demonstrate that the proposed technique reduces input token sequence lengths by 27% and 18%, respectively. Furthermore, the use of a transformer-based LLM reduces encoding computation by 47% and 33%, respectively, due to the quadratic nature of attention. We emphasize that token sequence transformations are easily reversible, with no loss of semantic information. We evaluate the proposed method on two tasks requiring precise preservation of semantic/syntactic information, and demonstrate that existing lossy compression methods underperform in these settings. The lossless technique exhibits a small performance difference compared to uncompressed inputs, and we expect the performance difference to disappear entirely with larger models and increased computational budgets.

Takeaways, Limitations

Takeaways:
We present a task-independent lossless compression technique for LLM prompts, demonstrating that the input token sequence length can be significantly reduced.
It can effectively reduce the encoding computational load of transformer-based LLM.
It outperforms existing lossy compression methods in tasks where accurate preservation of meaning/syntactic information is important.
Limitations:
Despite using a lossless compression technique, there is a small performance difference compared to uncompressed input.
The evaluation was limited to two tasks, requiring further research on generalizability.
Performance improvements with larger models and expanded compute budgets have not been experimentally confirmed.
👍