In this paper, we propose a method to utilize underutilized Japanese PDF data to solve the problem of limited performance of Japanese large-scale multimodal models (LMMs) due to the lack of high-quality Japanese training data. Instead of relying on existing translated English datasets, we build a pipeline to automatically extract image-text pairs from PDFs using a pre-trained model, and generate additional pointer data from the extracted data. Using the generated data, we train a Japanese LMM, and achieve a performance improvement of 2.1% to 13.8% on the Japanese LMM benchmark, Heron-Bench. We verify the utility of PDF data through performance analysis according to model size and language model.