Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

Created by
  • Haebom

Author

Alice Schiavone, Marco Fraccaro, Lea Marie Pehrson, Silvia Ingala, Rasmus Bonnevie, Michael Bachmann Nielsen, Vincent Beliveau, Melanie Ganz, Desmond Elliott

MOSAIC: Multilingual, Taxonomy-Agnostic, and Computationally Efficient Radiological Report Classification

Outline

MOSAIC is a multilingual, taxonomy-independent, and computationally efficient approach for radiology report classification. It is built on a compact, publicly available language model (MedGemma-4B) and supports both zero- and few-shot prompting and lightweight fine-tuning. MOSAIC has been evaluated on seven datasets in English, Spanish, French, and Danish, covering multiple imaging modes and labeling schemes. It achieves an average macro F1 score of 88 on five chest X-ray datasets, approaching or exceeding expert-level performance, requiring only 24 GB of GPU memory. Using data augmentation, it can achieve a weighted F1 score of 82 on Danish reports with only 80 annotated samples. The code and models are open source.

Takeaways, Limitations

Multilingual support: evaluated on English, Spanish, French, and Danish datasets.
Classification system independent: Supports various image modes and label classification systems.
Computational efficiency: Uses a small open-source language model (MedGemma-4B) and can be deployed on consumer-grade GPUs.
Zero/Few-Shot Learning Support: Achieve high performance even with a small amount of annotated data.
Open source: code and models released.
Limitations: Limitations of the language and dataset used in the study.
Limitations: Further evaluation is needed for extension to other modalities and classification systems.
👍