Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models

Created by
  • Haebom

Author

Erez Meoded

Outline

This paper studies the performance enhancement of Historical Handwriting Recognition (HTR) by applying TrOCR, a state-of-the-art transformer-based HTR model, to a 16th-century Latin manuscript by Rudolf Gualter. Specifically, we apply image preprocessing and various data augmentation techniques (including four novel augmentation techniques that consider the characteristics of historical handwritings), and evaluate an ensemble learning method that leverages the strengths of augmented models. As a result, a single augmented model (Elastic) achieves a character error rate (CER) of 1.86, while a top-five model voting ensemble achieves a CER of 1.60, representing a 42% relative performance improvement over the previous best-performing model and a 50% relative performance improvement over the existing TrOCR_BASE. This demonstrates the importance of domain-specific augmentation and ensemble strategies.

Takeaways, Limitations

Takeaways:
We demonstrate that domain-specific data augmentation techniques can significantly improve historical handwriting recognition performance.
We demonstrate that ensemble learning can achieve higher performance than a single model.
Presenting a method for effectively applying the TrOCR model to historical manuscripts.
Achieving new state-of-the-art HTR performance on 16th-century Latin manuscripts.
Limitations:
The study was limited to a specific period (16th century), language (Latin), and scribe (Rudolf Gualter). Generalizability to manuscripts from other periods, languages, and scribes requires further research.
Further research is needed to determine the generalizability of the proposed data augmentation technique and its applicability to other HTR models.
Lack of detailed information about the size and diversity of the datasets used.
👍