Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

Created by
  • Haebom

Author

Rahul Raja, Arpita Vats

Outline

This paper investigates the generalization ability of small-scale language models using two major adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is favored for its parameter efficiency and flexibility, it is unclear how robust it is in resource-poor environments and under distributional variation. This paper compares prompting and fine-tuning across a variety of task formats, prompting styles, and model sizes, with a particular focus on their behavior in in-distribution and out-of-distribution (OOD) settings. In addition to accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our results highlight important differences in how small-scale models internalize and generalize knowledge under different adaptation strategies. This study provides practical guidance for model selection in data-poor environments, and provides empirical insight into the ongoing debate over prompting versus fine-tuning.

Takeaways, Limitations

Takeaways: Comparative analysis of few-shot prompting and fine-tuning strategies for small-scale language models reveals differences in generalization performance in low-resource environments and distributional variation situations. Provides practical guidance for model selection in data-poor situations. Presents the strengths and weaknesses of each strategy through internal representation analysis according to prompting and fine-tuning strategies.
Limitations: Lack of information on specific experimental settings, datasets, and model types. Lack of detailed descriptions of OOD generalization performance evaluations in non-distributed settings. Lack of clear presentation of metrics and evaluation criteria used in the analysis. Code disclosure is positive, but it is uncertain whether sufficient information is provided to ensure reproducibility.
👍