This paper investigates the generalization ability of small-scale language models using two major adaptation paradigms: few-shot prompting and supervised fine-tuning. While prompting is favored for its parameter efficiency and flexibility, it is unclear how robust it is in resource-poor environments and under distributional variation. This paper compares prompting and fine-tuning across a variety of task formats, prompting styles, and model sizes, with a particular focus on their behavior in in-distribution and out-of-distribution (OOD) settings. In addition to accuracy, we analyze the internal representations learned by each approach to assess the stability and abstraction of task-specific features. Our results highlight important differences in how small-scale models internalize and generalize knowledge under different adaptation strategies. This study provides practical guidance for model selection in data-poor environments, and provides empirical insight into the ongoing debate over prompting versus fine-tuning.