Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Tenyidie Syllabification corpus creation and deep learning applications

Created by
  • Haebom

Author

Teisovi Angami, Kevisino Khate

Outline

This paper presents a study on syllable segmentation in Tenyidie, a low-resource Tibeto-Burman language spoken by the Tenyimia community in the Nagaland region of northeastern India. Tenyidie is a tonal language with a subject-object-verb word order and a highly agglutinative nature. We constructed a corpus of 10,120 syllable-segmented Tenyidie words and applied LSTM, BLSTM, BLSTM+CRF, and encoder-decoder deep learning architectures. Using a dataset split ratio of 80:10:10 (training:validation:test), the BLSTM model achieved a peak accuracy of 99.21%.

Takeaways, Limitations

Takeaways:
The first study of syllable segmentation in the Tenidiean language.
Contributed to the development of NLP research on the Tenidiean language, a low-resource language.
The constructed corpus can be used for other NLP tasks such as morphological analysis, part-of-speech tagging, and machine translation.
Achieving high accuracy with the BLSTM model.
Limitations:
No specific mention of Limitations in the paper.
👍