Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Is Pre-training Truly Better Than Meta-Learning?

Created by
  • Haebom

Author

Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

Outline

This paper re-evaluates the prevailing belief that pre-trained (PT) models and fine-tuning outperform meta-learning algorithms in small-sample learning. Using diverse datasets, we compare PT with model-agnostic meta-learning (MAML) under the same architecture, optimizer, and training conditions until convergence. We rigorously verify statistical significance using the effect size (Cohen's d) and analyze the data by calculating the coefficient of formal diversity. The results show that PT outperforms MAML when the dataset has low formal diversity, and MAML outperforms PT when the dataset has high formal diversity. However, the effect size is less than 0.2, indicating a statistically insignificant difference. We also conducted experiments on a large dataset, including 21 small-sample learning benchmarks and the Meta-Dataset, and found no significant difference in experiments using GPT-2 on the Openwebtext dataset. Therefore, we conclude that pre-trained models do not always outperform meta-learning models and that the formal diversity of the dataset is an important factor.

Takeaways, Limitations

Takeaways:
A rebuttal to the conventional wisdom that pre-trained models are always superior to meta-learning models.
To investigate the impact of formal diversity of datasets on minority-sample learning performance.
A more rigorous and fair experimental design is presented to compare the performance of MAML and pre-trained models.
Limitations:
The difference in performance between PT and MAML was too small to be statistically significant (effect size < 0.2).
Lack of consideration of the influence of factors other than formal diversity.
Generalization of results to specific architectures and optimizers may be limited.
👍