Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MedGemma Technical Report

Created by
  • Haebom

Author

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cian Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram e, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Leonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

Outline

MedGemma is a collection of medical image-language models based on Gemma 3 4B and 27B. To address the challenges of developing AI in healthcare, such as diverse medical data, complex tasks, and the need for privacy, we present a base model that performs well on medical tasks with a small amount of task-specific tuning data. MedGemma demonstrates advanced medical understanding and reasoning capabilities for images and text, significantly outperforming generative models of similar size and approaching the performance of task-specific models. It also improves performance over existing models on out-of-distribution tasks (2.6-10% in medical multimodal question-answering, 15.5-18.1% in thoracic Xline finding classification, and 10.8% in agent evaluation) while maintaining the general capabilities of Gemma 3-based models. Through fine-tuning, we further improve the performance in subdomains, achieving lung herniation classification and histopathology patch classification performance comparable to existing state-of-the-art methods. We also introduce MedSigLIP, a vision encoder tuned for medical use, which powers MedGemma’s visual understanding capabilities and achieves performance comparable to or superior to specialized medical image encoders. MedGemma provides a foundation for powerful medical image and text capabilities that have the potential to significantly accelerate medical research and downstream application development.

Takeaways, Limitations

Takeaways:
It can accelerate the development of medical AI by providing a powerful foundation model for medical image and language understanding.
It provides versatility applicable to a variety of medical tasks with small amounts of data.
It outperforms or approaches existing specific task models, and also shows performance improvements in tasks outside the distribution.
Fine-tuning can further improve performance in specific medical subdomains.
MedSigLIP sets a new standard for medical image encoding.
Limitations:
The paper lacks specific reference to __T5078_____ or constraints.
Further evaluation of the model's generalization ability may be necessary.
Additional validation using large-scale real-world medical datasets is needed.
There is a need for in-depth discussion on medical ethics and privacy issues.
👍