Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency

Created by
  • Haebom

Author

Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov

Outline

This paper theoretically and experimentally analyzes the inefficiency of Self-Consistency (SC), a Test Time Extension (TTS) technique, and proposes Slim-SC, a novel method to improve it. SC generates multiple inference processes in parallel and selects a final answer through majority voting, but suffers from high computational costs. Slim-SC utilizes a stepwise pruning strategy that removes redundant chains by exploiting chain similarity during the inference phase, reducing inference latency and KVC usage by up to 45% and 26%, respectively, while maintaining or improving accuracy. This provides a simple yet efficient TTS alternative to SC.

Takeaways, Limitations

Takeaways:
We present theoretical and experimental analyses of the inefficiencies of SCs to suggest potential improvements.
We demonstrate that Slim-SC can significantly reduce the computational cost of SC while maintaining or improving accuracy.
Provides a simple and efficient alternative to SC.
Contributes to research on TTS techniques to improve the inference performance of LLM.
Limitations:
The performance improvements of Slim-SC may be limited to specific datasets and LLM architectures.
Further research may be needed on how to measure similarity at the thinking stage.
Experiments with more diverse LLM architectures and datasets are needed.
👍