Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

Created by
  • Haebom

Author

Lovedeep Gondara, Jonathan Simkin, Graham Sayle, Shebnum Devji, Gregory Arbour, Raymond Ng

Outline

This study investigates the necessity of fine-tuning versus zero-shot pretraining, the benefits of domain-specific versus general pretraining, the value of additional domain-specific pretraining, and the continued relevance of small-scale language models (SLMs) over large-scale language models (LLMs) for specific tasks to guide language model selection. Using electronic pathology reports from the British Columbia Cancer Registry (BCCR), we evaluated three classification scenarios with varying difficulty and data sizes. Various SLMs and one LLM were used as models. SLMs were evaluated using both zero-shot and fine-tuning methods, while LLMs were evaluated solely on zero-shot. Fine-tuning significantly improved SLM performance compared to zero-shot results in all scenarios. Zero-shot LLMs outperformed zero-shot SLMs but consistently lagged behind fine-tuned SLMs. Domain-specific SLMs outperformed general SLMs after fine-tuning, particularly on challenging tasks. Additional domain-specific pretraining provided only a marginal benefit on easy tasks, but significant improvements on complex and data-poor tasks. In conclusion, we demonstrate that fine-tuning SLM in specific domains is crucial and can outperform zero-shot LLM on target classification tasks. Pretraining on domain-relevant or domain-specific data provides additional benefits, especially for complex problems or with limited fine-tuning data. While LLM offers powerful zero-shot capabilities, it did not match the performance of a properly fine-tuned SLM on the specific task in this study. Even in the LLM era, SLM remains relevant and efficient, and can offer a better performance-resource balance than LLM.

Takeaways, Limitations

Takeaways:
We demonstrate that fine-tuning SLM can outperform zero-shot LLM for specific domain tasks.
Domain-specific or domain-specific pre-training has been shown to contribute to improved performance, especially for difficult tasks or when data is scarce.
This suggests that SLM is still useful even in the LLM era and may be more resource-efficient in terms of performance compared to LLM.
Limitations:
The dataset used in the study was limited to electronic pathology reports from the British Columbia Cancer Registry (BCCR), which may limit generalizability.
The limited number of LLMs evaluated means that comparative analysis with other LLMs is lacking.
Further research is needed to draw generalized conclusions across different types of work.
👍