Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

Created by
  • Haebom

Author

Jiarui Yao, Ruida Wang, Tong Zhang

Outline

Large-scale language models (LLMs) demonstrate remarkable performance in a variety of tasks, including text generation, classification, and question answering, but their reasoning capabilities remain a matter of debate. The inherent ambiguity of natural language (NL) limits the verifiable reasoning capabilities of LLMs, resulting in a lack of consistent answers and reliable support. To address this issue, this paper proposes FANS: Formal ANSwer Selection for Natural Language Math Reasoning Using Lean4, a novel framework that enhances NL mathematical reasoning using Lean4. FANS transforms NL mathematical problems and LLM-generated answers into Lean4 theorems, attempts to prove them using a Lean4 prover, and verifies them using Lean4. Ultimately, the FL results are used to support answer selection. This framework enhances the NL mathematical capabilities of LLMs by providing computer-verifiable solutions to correct answers, and presents an alternative to reward models for answer selection. Extensive experimental results demonstrate that this framework can improve the accuracy of LLM with a reward model by up to 1.91% on the MATH-500 dataset and by up to 8.33% on the AMC-23 dataset. In certain domains, such as number theory, where Lean4 experts excel, it can even select all correct answers. Qualitative analysis also demonstrates that this framework can formally support NL results with Lean4 proofs. This pioneering work in this field will further accelerate the advancement of this field by making all models and datasets publicly available.

Takeaways, Limitations

Takeaways:
Improving NL Mathematical Reasoning Skills in LLMs Using Lean4.
Presenting an answer selection method that replaces the reward model.
Demonstrated accuracy improvements on MATH-500 and AMC-23 datasets.
In certain fields, such as number theory, all correct answers can be selected.
Supporting NL results with Lean4 proof.
Contribute to research advancement through open access to models and datasets.
Limitations:
Limitations is not listed in the paper. (Not included in the abstract)
👍