Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Part-of-speech tagging for Nagamese Language using CRF

Created by
  • Haebom

Author

Alovi N Shohe, Chonglio Khiamungam, Teisovi Angami

Outline

This paper studies part-of-speech tagging in the Naga language, a crucial task in natural language processing (NLP). Nagase is a creole language based on Assamese, used for commercial communication between the Naga people of northeastern India and the Assamese region. While much research has been done on part-of-speech tagging in resource-rich languages such as English and Hindi, there has been little research on Nagase. This is the first attempt at part-of-speech tagging in Nagase, aiming to identify parts of speech in Nagase sentences. We generated an annotated corpus of 16,112 tokens and applied a machine learning technique called conditional random fields (CRF). Using CRF, we achieved an overall tagging accuracy of 85.70%, a precision of 86%, and an F1 score of 85%.

Takeaways, Limitations

Takeaways:
The first study on part-of-speech tagging in Nagao.
Building an annotated corpus of 16,112 tokens.
Achieving a high accuracy of 85.70% using the CRF model.
Limitations:
Lack of resources in the target language.
Room for improvement in model performance.
Further research is needed to determine model generalizability.
👍