Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

Created by
  • Haebom

Author

Muhammad Tayyab Khan, Zane Yong, Lequn Chen, Jun Ming Tan, Wenhe Feng, Seung Ki Moon

Outline

This paper proposes a novel hybrid deep learning framework for accurately extracting key information from 2D engineering drawings. To address the issue of conventional OCR techniques generating unstructured output due to complex layouts and overlapping symbols, we utilize a hybrid approach that integrates an oriented bounding box (OBB) detection model and a transformer-based document parsing model (Donut). Using YOLOv11, we detect nine major categories—GD&T, general tolerances, dimensions, materials, annotations, radii, surface roughness, threads, and title blocks—and fine-tune Donut to generate structured JSON output. We compare two fine-tuning strategies: a single model for all categories and a category-specific model. We find that the single model achieves higher precision (94.77% for GD&T), recall (100% for most categories), F1 score (97.3%), and reduces hallucinations (5.23%) across all evaluation metrics. The proposed framework improves accuracy, reduces manual work, and supports scalable deployment in precision-based industries.

Takeaways, Limitations

Takeaways:
We present a novel deep learning-based framework for accurately and efficiently extracting key information from 2D engineering drawings.
Improved accuracy and reduced manual effort through effective integration of OBB detection and Transformer-based document parsing models.
Validation of the superiority of a single-model-based fine-tuning strategy (high precision, recall, F1 score achievement, and reduced hallucinations)
Supporting scalable deployment in precision-based industries
Limitations:
The performance evaluation of the proposed framework relies on a dataset built by the research team itself. Generalization performance across various drawing types and complexities needs to be verified.
Performance was evaluated for nine specific categories, and generalizability to other types of information extraction requires further study.
It depends on the specific version of YOLOv11 and Donut model, and performance may vary when using other models.
Further validation and optimization are required for application in real industrial environments.
👍