This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
This paper introduces FineLogic, a framework for evaluating the logical reasoning performance of large-scale language models (LLMs). To overcome the limitations of existing evaluation methods that rely solely on final answer accuracy, FineLogic evaluates logical reasoning across three dimensions: overall accuracy, step-by-step soundness, and representation-level probing. We fine-tune the LLM using various supervision methods (natural language and symbolic) and analyze the impact of each supervision method on its inference performance.
Takeaways, Limitations
•
Takeaways:
◦
Natural language supervision shows strength in generalizing to out-of-distribution and long-chain problems.
◦
Symbolic supervision is effective in building structurally sound, atomic inference steps.
◦
Fine-tuning primarily contributes to improving the step-by-step creation process of the model.
◦
The FineLogic framework presents a novel approach to assessing and improving logical reasoning in LLM.
•
Limitations:
◦
The information provided alone provides limited insight into FineLogic's specific implementation methods or detailed evaluation criteria.