Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning

Created by
  • Haebom

Author

Prateek Chanda, Saral Sureka, Parth Pratim Chatterjee, Krishnateja Killamsetty, Nikhil Shivakumar Nayak, Ganesh Ramakrishnan

Outline

This paper highlights that the fine-tuning performance of large-scale language models (LLMs) is highly dependent on the training data mixture composition, yet the process of selecting the optimal data mixture is manual and heuristic-dependent. Therefore, we propose TASKPGM, a principled and scalable mixture optimization framework that selects continuous task ratios by minimizing an energy function using Markov Random Fields (MRFs). TASKPGM models the relationships between tasks using behavioral differences, such as Jensen-Shannon Divergence and Pointwise Mutual Information, computed from the predictive distribution of single-task fine-tuning models. It provides a closed-form solution under group constraints and provably balances representativeness and diversity across tasks. It demonstrates consistent empirical performance gains across evaluation tools such as MMLU and BIGBench on Llama 2 and Mistral, along with theoretical guarantees (including weak submodularity for budget-constrained variants). Beyond performance, TASKPGM provides interpretable insights into task influence and mixture composition, making it a powerful tool for efficient and robust LLM fine-tuning.

Takeaways, Limitations

Takeaways:
We present TASKPGM, a principled and scalable framework for data blending optimization for LLM fine-tuning.
Modeling inter-task relationships using the predictive distribution of a single-task fine-tuning model, balancing representativeness and diversity.
Demonstrate consistent performance improvements on various evaluation tools such as MMLU and BIGBench on Llama 2 and Mistral.
Provides interpretable insights into work influences and mixed composition.
Provides theoretical guarantees (including weak submodularity).
Limitations:
Further research is needed on the practical applicability and generalization performance of TASKPGM.
Performance evaluation of TASKPGM is needed for various LLM architectures and task types.
Further analysis of the computational cost and efficiency of the energy function minimization process is needed.
The appropriateness of the behavioral difference indicators used in MRF modeling needs to be reviewed.
👍