In this paper, we propose OrthoInsight, a multimodal deep learning framework for thoracic fracture diagnosis. OrthoInsight integrates the YOLOv9 model for fracture detection, a medical knowledge graph for clinical context retrieval, and a fine-tuned LLaVA language model for diagnostic report generation. It combines visual features of CT images with text data from experts to provide clinically useful results. In an evaluation using 28,675 annotated CT images and expert reports, it achieves high performance (average score of 4.28) in diagnostic accuracy, content completeness, logical consistency, and clinical guideline value, outperforming models such as GPT-4 and Claude-3. This study demonstrates the potential of multimodal learning to transform medical image analysis and provide effective support to radiologists.