Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

Created by
  • Haebom

Author

Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke

Outline

This paper presents a novel continuous pretraining framework , MachineLearningLM , to address the challenge of large-scale language models (LLMs) learning from a large number of contextual examples in traditional machine learning (ML) tasks . MachineLearningLM pretrains LLMs using ML tasks generated from millions of structured causal models (SCMs). Specifically, it uses random forests to infuse tree-based decision-making strategies into LLMs, enhancing the robustness of numerical modeling. It also uses token-efficient prompts to increase the number of examples per context window by a factor of 3-6 and improves throughput by up to 50x through batch inference. Despite its small Qwen-2.5-7B-Instruct-based setup, it outperforms existing robust LLM baseline models by an average of 15% on out-of-distribution tabular data classification across various domains (finance, physics, biology, and medicine), demonstrating a monotonic increase in accuracy as the number of contextual examples increases. Furthermore, it achieves a performance of 75.4% on MMLU, maintaining general conversational competence.

Takeaways, Limitations

Takeaways:
Presenting a new framework that significantly enhances learning capabilities within the context of LLM.
Improving robustness in numerical modeling.
Efficient learning and inference through token-efficient prompts.
Achieves superior performance compared to existing models in various fields.
Clear evidence of performance improvement as the number of examples in a context increases.
Maintain general conversational skills.
Limitations:
Currently, this study is limited to a specific LLM (Qwen-2.5-7B-Instruct) and LoRA configuration. Further research is needed to determine generalizability to other LLMs and methodologies.
Lack of analysis on the impact of the type and quality of SCM used on final performance.
Although these are experimental results using a large-scale dataset, further performance evaluation in a real-world application environment is required.
👍