Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FlexOlmo: Open Language Models for Flexible Data Use

Created by
  • Haebom

Author

Weijia Shi, Akshita Bhagia, Kevin Farhat, Niklas Muennighoff, Pete Walsh, Jacob Morrison, Dustin Schwenk, Shayne Longpre, Jake Poznanski, Allyson Ettinger, Daogao Liu, Margaret Li, Dirk Groeneveld, Mike Lewis, Wen-tau Yih, Luca Soldaini, Kyle Lo, Noah A. Smith, Luke Zettlemoyer, Pang Wei Koh, Hannaneh Hajishirzi, Ali Farhadi, Sewon Min

Outline

FlexOlmo is a new kind of language model that supports distributed learning and data-flexible inference. Each expert model is trained on a closed dataset individually and integrated through a novel routing method that leverages domain information. It is trained using the FlexMix corpus consisting of public datasets and 7 domain-specific closed datasets, and the models with up to 37 billion parameters (20 billion active) are evaluated on 31 different subtasks. It effectively combines general expert models trained on public data with expert models trained independently from other data owners, achieving an average performance improvement of 41%. It can selectively exclude specific data according to data license or permission requirements, and shows an average performance improvement of 10.1% over existing model merging methods. It provides a useful solution for both data owners and researchers in regulated industries with sensitive or protected data.

Takeaways, Limitations

Takeaways:
Distributed learning allows you to leverage data from multiple data owners without sharing data.
Data-flexible inference enables compliance with data license and permission requirements, while providing flexible control over the inclusion of specific data.
It outperforms the standard MoE learned without data constraints and existing model merging methods.
We provide useful solutions for leveraging sensitive data in regulated industries.
Limitations:
Further validation is needed on how well the closed datasets in the FlexMix corpus represent realistic closed datasets.
Further research is needed on generalization performance on various types of closed datasets.
A detailed description of the routing method using domain information and further analysis of its effectiveness are needed.
👍