[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script

Created by
  • Haebom

Author

Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Henok Biadgligni Ademtew, Hizkel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam

Outline

This paper experimentally analyzes the impact of homophone normalization used in Amarna Natural Language Processing (NLP). Previous studies have shown that homophone normalization improves the performance of automatic evaluation metrics, but it fails to understand the diverse orthography of the language and can have a negative impact on transfer learning. Therefore, this paper analyzes the impact of normalization on single-language and cross-language transfer learning for languages using the Ge’ez script, and proposes a postprocessing method that applies normalization to model predictions instead of training data. The proposed method shows results that maintain linguistic features while improving the BLEU score by up to 1.03, emphasizing the discussion on the impact of technology on language change and the need for linguistic-aware interventions.

Takeaways, Limitations

Takeaways:
We demonstrate that post-processing regularization techniques can improve machine translation performance (BLEU score) while maintaining linguistic features.
We clarify the pros and cons of homonym normalization in Amarna NLP and propose a better approach.
Provides important Takeaways insights into the impact of technological advances on language change.
Emphasizes the importance of developing NLP models that take language characteristics into account.
Limitations:
The effectiveness of the proposed postprocessing method may be limited to languages using the Ge'ez script.
Further research is needed on generalizability to other languages and other NLP tasks.
Lack of extensive experimentation across different languages and tasks.
👍