This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
Created by
Haebom
Author
Ho Yin 'Sam' Ng, Ting-Yao Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ting-Hao 'Kenneth' Huang
Outline
To address the personalization challenge of AI-generated image captions, this paper presents LaMP-Cap, a dataset for generating personalized image captions using multimodal image profiles. LaMP-Cap provides multimodal profiles that not only include the image for each image, but also the images, captions, and figure mentions of other images within the same document. Experimental results demonstrate that utilizing profile information helps generate captions that are more closely aligned with the original author's captions, and that images within the profiles are more effective than textual information. This study overcomes the limitations of text-based personalization methods and highlights the advantages of multimodal profiles.
Takeaways, Limitations
•
Takeaways:
◦
Experimental demonstration of the effectiveness of image caption personalization using multi-modal profiles.
◦
It suggests that image information is more effective than text information in generating personalized captions.
◦
The LaMP-Cap dataset can contribute to future research on personalized picture caption generation.
•
Limitations:
◦
Further review of the size and diversity of the LaMP-Cap dataset is needed.
◦
Further research is needed to determine generalizability across different types of drawings and research fields.
◦
Further research is needed on selection and weighting strategies for profile information.