Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

Created by
  • Haebom

Author

Pinzheng Wang, Juntao Li, Zecheng Tang, Haijia Gui, Min Zhang

Outline

In this paper, we propose a self-learning-based Critic-Discernment Game (CDG) to address the lack of true understanding of the reasoning process of large-scale language models (LLMs). The CDG proceeds in such a way that a prover proposes a solution to a problem and a critic criticizes it. The critiques are divided into helpful and misleading ones, and the prover aims to maintain the correct answer under misleading criticisms and correct the errors under constructive criticisms. Through experiments on mathematical reasoning, step-by-step error detection, self-correction, and long-term reasoning tasks, we demonstrate that CDG training can improve the understanding of the reasoning process of well-aligned LLMs.

Takeaways, Limitations

Takeaways:
We propose that self-learning-based CDG can enhance LLM's ability to understand the reasoning process.
We present a novel method to improve the rationality of models without human or higher-level model supervision.
It shows the potential for performance improvement in various inference tasks such as mathematical reasoning, error detection, and self-correction.
Limitations:
The effects of CDG may be limited to well-aligned LLMs. Generalizability to other types of LLMs requires further study.
Lack of detailed description of the critic's critical strategy design and evaluation.
Further validation of the generalizability and universality of the experimental results is needed.
Lack of detailed analysis of CDG's training process and computational costs.
👍