Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

Created by
  • Haebom

Author

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke

Outline

This paper presents a novel framework, MachineLearningLM, that enhances the context-sensitive learning (ICL) capabilities of large-scale language models (LLMs). MachineLearningLM is pretrained using a variety of machine learning (ML) tasks generated from millions of structured causal models (SCMs). Specifically, it infuses LLM with a random forest-based decision-making strategy to enhance the robustness of numerical modeling, and enhances throughput by processing more examples per context window through token-efficient prompts. Experimental results demonstrate that MachineLearningLM outperforms existing robust LLM baseline models by an average of 15% on non-distributed tabular data classification tasks across various domains, exhibiting a remarkable multi-shot scaling law, with accuracy monotonically increasing as the number of examples within the context increases. Furthermore, it maintains general chat functionality, knowledge, and inference capabilities.

Takeaways, Limitations

Takeaways:
We present a novel framework to effectively enhance the contextual learning capabilities of LLM.
Achieve superior performance over existing models in ML tasks across a wide range of fields.
We verify the multi-shot scaling law, which shows that performance improves as the number of examples in the context increases.
Enhance ML capabilities while maintaining general chat functionality and knowledge and reasoning capabilities.
Significantly improve throughput through token-efficient prompts.
Limitations:
Currently, only experimental results using a specific scale of LLM (Qwen-2.5-7B-Instruct) and LoRA are presented. Further research is needed to determine generalizability to other models and settings.
Lack of detailed description of the process of generating and selecting structural causal models (SCMs).
Further validation of generalization performance across various ML task types is needed.
👍