Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning

Created by
  • Haebom

Author

Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen

Outline

In this paper, we study the scaling effect on the inference performance of large-scale language models (LLMs). We present a synthetic multi-stage inference environment that mimics the structure and distribution of real-world large-scale knowledge graphs, and evaluate the inference performance of LLMs by predicting missing edges in incomplete graphs. The experimental results reveal a U-shaped loss curve, which shows that excessive parameters can hinder the inference performance due to excessive memorization. We investigate the effects of various factors such as graph structure, model size, and training steps on this curve, and propose an empirical scaling that linearly maps knowledge graph search entropy to the optimal model size to predict the optimal model size for a given knowledge graph. In conclusion, this study provides new insights into the relationship between scaling and inference in LLMs, and suggests a method to optimize the performance on inference tasks.

Takeaways, Limitations

Takeaways:
We systematically analyze the scaling effect on the inference ability of LLM and reveal the negative impact of over-parameterization.
A method for predicting optimal model size using search entropy of knowledge graphs is presented.
We present a novel synthetic inference environment that mimics real-world inference scenarios.
Limitations:
The presented synthetic environment may not perfectly reflect all the complexities of the real world.
Further research is needed on the generalizability of empirical scaling methods.
Focusing on a specific type of knowledge graph may limit generalizability to other types of graphs.
👍