This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation
Created by
Haebom
Author
Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge
Outline
CRISP-SAM2 is a multi-organ medical segmentation model that plays an important role in medical image processing. To solve the problems of inaccurate details, geometric prompt dependency, and spatial information loss of existing models, we present a new model that uses cross-modal interaction and semantic prompting. We transform visual and textual inputs into cross-modal contextualized meanings through an advanced cross-attention interaction mechanism, and feed them into the image encoder to enhance the understanding of visual information. We use a semantic prompting strategy to remove the dependency on geometric prompts, and apply a similarity-aligned self-updating strategy for memory and a mask improvement process to adapt to medical images and enhance local details. Comparative experiments on seven public datasets show that our model outperforms existing models, and in particular, it effectively solves the Limitations of existing models. The code is available at https://github.com/YU-deep/CRISP_SAM2.git .
Effectively addressing the __T3237__ (inaccurate details, geometric prompt dependency, spatial information loss) of existing models in multi-institutional healthcare segmentation.
◦
Effectively leverage visual and textual information through cross-modal interaction and semantic prompting.
◦
Enhancing local details through a similarity-aligned self-updating strategy for memory and a mask refinement process.
◦
We validate that our model outperforms existing models on seven public datasets.
◦
Reproducibility is possible through open code.
•
Limitations:
◦
Specific Limitations is not explicitly mentioned in the paper. Additional experiments and analyses may be needed for a more in-depth evaluation.
◦
There may be a possibility of reduced performance for certain types of medical images or institutions.
◦
Lack of detailed analysis of the computational cost and complexity of the model.