This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
Created by
Haebom
Author
Ho Yin 'Sam' Ng, Ting-Yao Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ting-Hao 'Kenneth' Huang
Outline
This paper highlights the need for personalizing AI-generated figure captions to match the author's style and the style of the field. We introduce LaMP-Cap, a dataset for generating personalized figure captions using multimodal figure profiles. LaMP-Cap provides not only the image for each figure but also up to three profiles (including the image, caption, and figure citation paragraph) from other figures in the same document to characterize their context. Experimental results show that using profile information helps generate captions more similar to author-written captions, and that the images in the profiles are more informative than the figure citation paragraphs. This demonstrates the advantages of multimodal profiles.
Takeaways, Limitations
•
Takeaways:
◦
We empirically demonstrate the utility of generating personalized image captions using multimodal (image, text) profiles.
◦
The LaMP-Cap dataset is expected to make a significant contribution to future research on personalized picture caption generation.
◦
We found that image information within a profile is more effective in generating captions than text information.
•
Limitations:
◦
Further review of the size and diversity of the LaMP-Cap dataset is needed.
◦
The dataset needs to be expanded to more comprehensively reflect different types of illustrations and author styles.
◦
Consideration should be given to the possibility of overfitting to specific domains or author styles.