Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

Created by
  • Haebom

Author

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke

Outline

This paper presents a novel framework, MachineLearningLM, that enhances the in-context learning (ICL) capabilities of large-scale language models (LLMs). MachineLearningLM is pretrained using a variety of machine learning (ML) tasks generated from millions of structured causal models (SCMs). Specifically, we infuse the LLM with a decision-making strategy based on random forests to enhance the robustness of numerical modeling. Furthermore, we utilize token-efficient prompts to process 3-6x more examples per context window and achieve up to 50x throughput improvements through batch inference. Consequently, MachineLearningLM based on Qwen-2.5-7B-Instruct outperforms existing powerful LLM baseline models (e.g., GPT-5-mini) by an average of 15% on out-of-distribution tabular data classification tasks across various domains (e.g., finance, physics, biology, and medicine), demonstrating a monotonic increase in accuracy as the number of in-context examples increases (many-shot scaling law). Additionally, we achieved 75.4% performance in MMLU, demonstrating that it retains general conversational skills (knowledge and reasoning).

Takeaways, Limitations

Takeaways:
A new framework to dramatically improve the learning capabilities of LLMs within their context.
Robust performance improvements for non-distributed data across various fields.
We present a token-efficient prompt and batch inference technique that efficiently handles a large number of examples.
Improve performance on specific tasks while maintaining general conversational skills.
Experimental verification of the many-shot scaling law.
Limitations:
Currently, it is trained based on Qwen-2.5-7B-Instruct, so further study is needed to determine its generalizability to other LLMs.
Reliance on random forest-based decision-making strategies may be a limitation. Further research utilizing other types of models is needed.
Performance may vary depending on the type and scope of SCM used.
The computing resources required for large-scale SCM generation and dictionary training are significant.
👍