Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Created by
  • Haebom

Author

Yujun Zhou, Jiayi Ye, Zipeng Ling, Yufei Han, Yue Huang, Haomin Zhuang, Zhenwen Liang, Kehan Guo, Taicheng Guo, Xiangqi Wang, Xiangliang Zhang

Outline

This paper introduces FineLogic, a framework for evaluating the logical reasoning performance of large-scale language models (LLMs). To overcome the limitations of existing evaluation methods that rely solely on final answer accuracy, FineLogic evaluates logical reasoning across three dimensions: overall accuracy, step-by-step soundness, and representation-level probing. We fine-tune the LLM using various supervision methods (natural language and symbolic) and analyze the impact of each supervision method on its inference performance.

Takeaways, Limitations

Takeaways:
Natural language supervision shows strength in generalizing to out-of-distribution and long-chain problems.
Symbolic supervision is effective in building structurally sound, atomic inference steps.
Fine-tuning primarily contributes to improving the step-by-step creation process of the model.
The FineLogic framework presents a novel approach to assessing and improving logical reasoning in LLM.
Limitations:
The information provided alone provides limited insight into FineLogic's specific implementation methods or detailed evaluation criteria.
Additional information can be obtained through the paper's code ( https://github.com/YujunZhou/FineLogic) .
👍