This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
Created by
Haebom
Author
Quang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai-Kit Yeung
Outline
Marine imagery presents significant challenges for image understanding due to the dynamic nature of marine objects and their surroundings, camera motion, and the complexity of underwater scenes. Existing image caption datasets, which focus on general or human-centric domains, often fail to generalize the complexity of the marine environment and gain insights into marine life. To address these limitations, this paper proposes a two-stage marine object-oriented image captioning pipeline. We introduce a comprehensive image understanding benchmark leveraging three elements—image, text, and segmentation masks—to facilitate visual justification and caption generation, thereby enhancing marine image understanding and analysis, as well as marine image generation. Furthermore, we emphasize the effectiveness of image segmentation in detecting significant object transitions in scene changes, significantly enriching the semantics of caption content. The dataset and code are publicly available at https://msc.hkustvgd.com .