[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only

Created by
  • Haebom

Author

Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Siyi Xie, Xinyang Yin, Chao Shen

Outline

This paper presents Babel, a novel framework for improving style preservation in neural machine translation (NMT). Unlike existing style-preserving approaches that require parallel corpora, Babel uses only a single-language corpus. Babel consists of two main components: a style detector that identifies style inconsistencies based on contextual embeddings, and a diffusion-based style applier that corrects style inconsistencies while maintaining semantic integrity. It can be integrated as a postprocessing module into existing NMT systems, enabling style-aware translation without any architectural changes or parallel style data. Extensive experiments on five different domains (law, literature, scientific papers, medicine, and educational content) show that Babel identifies style inconsistencies with 88.21% precision and improves style preservation by 150% while maintaining a high semantic similarity score of 0.92. Human evaluations also confirm that Babel-improved translations better preserve the style of the original text while maintaining fluency and appropriateness.

Takeaways, Limitations

Takeaways:
We present a novel method to improve style preservation in NMT using only single-language corpora.
Easily integrated as a post-processing module into existing NMT systems.
Experimentally verified style retention and semantic similarity enhancement in various fields.
Verify translation quality improvement through human evaluation.
Limitations:
This paper lacks detailed descriptions of the types and sizes of specific monolingual corpora. Further research is needed on the generalizability to various corpora.
Due to the approach as a post-processing module, improvements to the style learning ability of the NMT model itself may be limited.
Since only the experimental results from five fields were presented, further research is needed to determine generalizability to other fields.
👍