Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models

Created by
  • Haebom

Author

Hamza Rasaee, Taha Koleilat, Hassan Rivaz

Outline

This paper highlights that accurate and generalizable object segmentation in ultrasound images is challenging due to anatomical variations, diverse imaging protocols, and limited annotation data. To address this, we propose a prompt-based Vision-Language Model (VLM) that integrates Grounding DINO and SAM2. We use 18 publicly available ultrasound datasets (breast, thyroid, liver, prostate, kidney, and paraspinal muscles). Fifteen datasets are used for fine-tuning and validation of Grounding DINO using Low Rank Adaptation (LoRA), while the remaining three are used for testing to evaluate performance on unknown distributions. Experimental results demonstrate that the proposed method outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS, on most existing datasets, maintaining robust performance even on unknown datasets without additional fine-tuning. This demonstrates that VLM reduces the reliance on large-scale organ-specific annotation data and holds promise for scalable and robust ultrasound image analysis.

Takeaways, Limitations

Takeaways:
Demonstrating the superiority of ultrasound image object segmentation using prompt-based VLM.
Excellent generalization performance for various ultrasound organs (breast, thyroid, liver, prostate, kidney, and paraspinal muscles).
Achieve improved performance compared to existing state-of-the-art methods.
Reduced reliance on large-scale, long-term specific annotation data.
Presenting scalable and powerful ultrasound image analysis capabilities.
Limitations:
Use of a limited number of public datasets.
Further research is needed on generalization performance in real clinical settings.
Code disclosure will be made after the paper is accepted.
👍