This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Created by
Haebom
Author
Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Ningyu Zhang, Huajun Chen
Outline
This paper explores strategies for improving the data analysis capabilities of an open-source large-scale language model (LLM). Using a seed dataset comprised of various realistic scenarios, we evaluate the model's performance across three key dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: the quality of strategic planning is a key determinant of model performance; interaction design and task complexity significantly impact inference performance; and data quality has a greater impact than diversity on achieving optimal performance. Based on these insights, we develop a data synthesis methodology to significantly improve the analytical inference capabilities of the open-source LLM. The code can be found at https://github.com/zjunlp/DataMind .