[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

Created by
  • Haebom

Author

Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Yeyun Gong, Peng Cheng, Mao Yang

Outline

In this paper, we propose a continuous pretraining method on small task-specific data to address the catastrophic forgetting problem that arises in continuous pretraining of large-scale language models to new target domains. While existing domain reweighting strategies rely on manually specifying heuristics based on human intuition or empirical results, we propose the first model-based end-to-end framework, Data Mixing Agent, to parameterize more general heuristics. Data Mixing Agent learns generalizable heuristics using reinforcement learning from a large number of data mixing paths and feedback from evaluation environments. In continuous pretraining experiments on mathematical inference, Data Mixing Agent outperforms strong baseline models in achieving balanced performance on source and target field benchmarks. It also generalizes well to unseen source field, target model, and domain spaces without retraining. Direct applications to code generation demonstrate its adaptability to target domains. Further analysis demonstrates that the agent’s heuristics are well aligned with human intuition and are efficient in achieving good model performance with a small amount of source field data.

Takeaways, Limitations

Takeaways:
We overcome the limitations of existing manual heuristic-based methods by automating domain weight rebalancing strategies through a model-based end-to-end framework.
It has achieved superior performance over existing methods in mathematical reasoning and code generation, demonstrating its applicability in various fields.
Increased efficiency by achieving excellent performance even with less source field data.
The learned heuristics matched human intuition well, increasing reliability.
We show good generalization even in unseen source fields, target models, and domain spaces.
Limitations:
Training a Data Mixing Agent may require significant amounts of data and computational resources.
It is possible that they have learned domain-specific heuristics, and further research is needed to determine whether they can generalize to all domains.
Performance may be affected by the design of the evaluation environment. Further research is needed to determine the generalizability of the evaluation environment.
👍