Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ChatSR: Multimodal Large Language Models for Scientific Formula Discovery

Created by
  • Haebom

Author

Yanjie Li, Lina Yu, Weijun Li, Min Wu, Jingyi Liu, Wenqiang Li, Shu Wei, Yusong Deng

Outline

In this paper, we propose ChatSR, a novel symbolic regression method that induces formula generation by providing prior knowledge through natural language based on the knowledge and language understanding capabilities of multimodal large-scale language models. Unlike existing symbolic regression methods that generate formulas directly from observed data, ChatSR understands and utilizes prior knowledge in natural language to improve the quality of formula generation. Experimental results on 13 datasets show that ChatSR demonstrates state-of-the-art performance on existing symbolic regression tasks, and in particular, it demonstrates zero-shot capability even for prior knowledge that is not present in the training data.

Takeaways, Limitations

Takeaways:
We present a novel approach to symbolic regression problems by leveraging multimodal large-scale language models.
Improving the accuracy and efficiency of formula generation by providing prior knowledge through natural language.
It presents the potential to contribute to various scientific discoveries and problem solving through superior performance and zero-shot capability compared to existing methods.
Limitations:
Additional analysis is needed on the limitations and potential errors in the natural language understanding capability of ChatSR presented in this paper.
Limitations of the ability to generate formulas for complex scientific phenomena and the need to explore ways to improve them.
Need to verify and improve generalization performance on various types of datasets.
👍