Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Created by
  • Haebom

Author

Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

Outline

CRISP-SAM2 is a multi-organ medical segmentation model that plays an important role in medical image processing. To solve the problems of inaccurate details, geometric prompt dependency, and spatial information loss of existing models, we present a new model that uses cross-modal interaction and semantic prompting. We transform visual and textual inputs into cross-modal contextualized meanings through an advanced cross-attention interaction mechanism, and feed them into the image encoder to enhance the understanding of visual information. We use a semantic prompting strategy to remove the dependency on geometric prompts, and apply a similarity-aligned self-updating strategy for memory and a mask improvement process to adapt to medical images and enhance local details. Comparative experiments on seven public datasets show that our model outperforms existing models, and in particular, it effectively solves the Limitations of existing models. The code is available at https://github.com/YU-deep/CRISP_SAM2.git .

Takeaways, Limitations

Takeaways:
Effectively addressing the __T3237__ (inaccurate details, geometric prompt dependency, spatial information loss) of existing models in multi-institutional healthcare segmentation.
Effectively leverage visual and textual information through cross-modal interaction and semantic prompting.
Enhancing local details through a similarity-aligned self-updating strategy for memory and a mask refinement process.
We validate that our model outperforms existing models on seven public datasets.
Reproducibility is possible through open code.
Limitations:
Specific Limitations is not explicitly mentioned in the paper. Additional experiments and analyses may be needed for a more in-depth evaluation.
There may be a possibility of reduced performance for certain types of medical images or institutions.
Lack of detailed analysis of the computational cost and complexity of the model.
👍