This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
Created by
Haebom
Author
Quang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai-Kit Yeung
Outline
This paper addresses the challenge of marine image understanding, which is hampered by the dynamic nature of the marine environment, camera movement, and the complexity of underwater scenes. Existing image caption datasets often focus on general or human-centric domains, failing to generalize to the complexity of the marine environment and providing insights into marine life. To address these limitations, this paper proposes a two-stage marine object-oriented image captioning pipeline. We introduce a comprehensive image understanding benchmark leveraging three elements—image, text, and segmentation masks—to facilitate visual justification and caption generation. This enhances marine image understanding and analysis, as well as marine image generation. Furthermore, we highlight the effectiveness of image segmentation in detecting significant object transitions in scene changes, significantly enriching the semantics of caption content. The dataset and code are publicly available at https://msc.hkustvgd.com .