Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

Created by
  • Haebom

Author

Shashank Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora

Outline

This paper proposes a transition from word-level OCR to line-level OCR to overcome the limitations of conventional character-level OCR . Conventional character-level OCR is prone to errors during character segmentation and has limited the utilization of language models. Word-level OCR addresses these issues, but it also suffers from the potential for errors during word segmentation. Therefore, this paper proposes line-level OCR, which overcomes the limitations of word-level OCR and avoids word detection errors while providing a broader context for sentences, thereby enhancing the usability of language models. Furthermore, we present a new dataset (251 English page images) for line-level OCR. Experimental results demonstrate that the proposed technique improves accuracy by 5.4% and efficiency by fourfold compared to conventional word-level OCR.

Takeaways, Limitations

Takeaways :
Proposal of a line-level OCR technique that overcomes the limitations of word-level OCR and improves accuracy and efficiency.
A new dataset for line-level OCR is released.
Experimentally verified improved accuracy (5.4%) and efficiency (4x improvement).
Suggests the possibility of further performance improvements as large-scale language models develop in the future.
Limitations :
Due to the lack of public datasets for line-level OCR, we had to build our own dataset.
Currently, only the English dataset is available. Expansion to other languages is needed.
👍