Limitations: The use cases presented in this paper are limited to GPT-4, which may limit their generalizability to other multimodal models. Furthermore, in-depth discussion on prompt engineering that combines diagrams and natural language may be lacking. Further research is needed to determine generalizability, as experimental results for various types of diagrams and complex software engineering tasks are not presented.