Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hawkeye:Efficient Reasoning with Model Collaboration

Created by
  • Haebom

Author

Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho

Outline

This paper proposes HAWKEYE, a novel post-training and inference framework, to address the efficiency challenges of Chain-of-Thought (CoT) inference. To address the computational cost and delay caused by excessive intermediate inference token generation in existing CoT inference, HAWKEYE adopts an approach where a large model generates concise CoT instructions, which are then utilized by a smaller model to generate responses. Reinforcement learning quantifies the redundancy of CoT inference and extracts dense information, significantly reducing token usage and computational costs while maintaining response quality. Experimental results demonstrate that HAWKEYE achieves similar response quality using only 35% of the total CoT, while improving clarity, consistency, and conciseness by approximately 10%. Furthermore, it speeds up inference by up to 3.4x and reduces inference costs by up to 60% on complex mathematical problems. HAWKEYE will be open-sourced.

Takeaways, Limitations

Takeaways:
We present a novel framework that can significantly improve the efficiency of CoT inference.
Economic inference is possible through reduced token usage and computational costs.
Improve clarity, consistency, and conciseness of responses.
Improved inference speed.
Improving accessibility through open source disclosure.
Limitations:
HAWKEYE's performance may depend on the performance of large and small models.
For certain types of problems, efficiency gains may be limited.
Additional verification of actual usage results is required after open source release.
👍