Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes

Created by
  • Haebom

Author

Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer

Outline

This paper presents the results of an evaluation of 11 open-source large-scale language models (LLMs) to improve the German manual tumor registration system. LLMs with 1 to 70 billion parameters were used to evaluate their performance on three basic tasks: tumor diagnosis identification, ICD-10 code assignment, and date of first diagnosis extraction. Using an annotated dataset generated from anonymized urologist notes, the models' performance was analyzed using several prompting strategies. Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12B models performed best, while models with fewer than 7 billion parameters showed significantly lower performance. Prompting with data from non-urological medical fields significantly improved performance, suggesting that open-source LLMs hold significant potential for automating tumor registration. We conclude that models with 7 to 12 billion parameters offer the optimal balance of performance and resource efficiency. The evaluation code and dataset are publicly available.

Takeaways, Limitations

Takeaways:
Demonstrates that an open-source LLM can be effectively utilized to automate tumor documentation in the German medical NLP field.
LLMs with 7 to 12 billion parameters offer a good balance between performance and resource efficiency.
We present the possibility of improving performance by utilizing various prompting strategies and additional datasets.
A new dataset is released to address the data shortage in German medical NLP.
Limitations:
The evaluation was limited to the field of urology and further research is needed to generalize the findings.
Additional fine-tuning and prompt engineering research is needed to improve model performance.
Models with fewer than 7 billion parameters perform poorly, highlighting the importance of model scale.
👍