Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention

Created by
  • Haebom

Author

Shree Mitra, Ritabrata Chakraborty, Nilkanta Sahu

Outline

This paper presents a novel self-supervised learning (SSL) framework for handwritten mathematical expression recognition (HMER). Designed to eliminate the need for expensive, conventional labeled data, the framework pretrains an image encoder by combining global and local contrastive losses. This allows for learning both global and fine-grained representations. Furthermore, we propose a novel self-supervised attention network, trained using a progressive spatial masking strategy. This attention mechanism focuses on meaningful regions, such as operators, exponents, and nested mathematical notation, without any supervision. The progressive masking curriculum enhances structural understanding by making the network increasingly robust to missing or occluded visual information. The overall pipeline consists of (1) self-supervised pretraining of the encoder, (2) self-supervised attention training, and (3) supervised fine-tuning using a Transformer decoder (for LaTeX sequence generation). Extensive experiments on the CROHME benchmark demonstrate the effectiveness of the progressive attention mechanism, outperforming existing SSL and fully supervised baseline models.

Takeaways, Limitations

Takeaways:
We present a novel SSL framework that trains high-performance handwritten mathematical expression recognition models without expensive labeled data.
Improving structural understanding of mathematical expressions through self-supervised attention networks utilizing progressive spatial masking strategies.
Achieves performance superior to existing SSL and fully supervised models on the CROHME benchmark.
Contributing to solving the data shortage problem in the HMER field through an efficient self-supervised learning method.
Limitations:
Further research is needed to evaluate the generalization performance of the proposed method. Further performance evaluations are needed for mathematical formulas of various styles and complexities.
Possible vulnerability to certain types of mathematical notation or handwriting.
Further research is needed to determine the optimal parameters for the progressive masking strategy.
Lack of performance evaluation and comparative analysis on large-scale datasets.
👍