Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

Created by
  • Haebom

Author

Weiming Wu, Jin Ye, Zi-kang Wang, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

Outline

To improve the geometric reasoning capabilities of multimodal large-scale language models (MLLMs), acquiring large-scale, high-quality inference data is crucial. To overcome the limitations of existing data generation methods, we propose NeSyGeo, a novel neural-symbolic framework. NeSyGeo uses a domain-specific language that comprehensively represents all elements of planar geometry, synthesizes symbolic sequences to map them to visual and textual representations, and generates inference paths through backward search and forward validation. Based on this framework, we build the NeSyGeo CoT and NeSyGeo-Caption datasets, each containing 100,000 samples, and release NeSyGeo-Test, a new benchmark for evaluating the geometric reasoning capabilities of MLLMs. Experimental results demonstrate that the proposed method significantly improves the performance of several MLLMs, particularly with a small number of samples and a small number of training epochs.

Takeaways, Limitations

Takeaways:
We address the diversity and numerical generalization challenges of geometric inference data generation with a novel neuro-symbolic framework, NeSyGeo.
The NeSyGeo framework has proven effective in improving the geometric inference capabilities of MLLMs.
Even with small amounts of data and training, the performance of MLLMs can be significantly improved.
The 4B model can achieve better results than the 8B model.
Limitations:
The specific Limitations is not specified in the paper.
👍