Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation

Created by
  • Haebom

Author

Deepon Halder, Thanmay Jayakumar, Raj Dabre

Outline

This paper proposes CycleDistill, a novel bootstrapping approach for building high-quality machine translation systems for low-resource languages. CycleDistill leverages a large-scale language model (LLM) and few-shot translations to iteratively generate synthetic parallel corpora from a single-language corpus, fine-tuning the model using the generated data. The parallel corpora require only 1-4 few-shot examples, and experiments on three Indian languages demonstrate that even with a single corpus, high-quality machine translation is achieved, with an average improvement of 20-30 chrF points in the first iteration compared to a few-shot baseline model. Furthermore, we investigate the effect of utilizing softmax activations during the distillation process and observe a slight improvement in translation quality.

Takeaways, Limitations

Takeaways:
An effective bootstrapping method for developing high-quality machine translation systems for low-resource languages is presented.
It shows that high performance can be achieved even with small amounts of data.
Presenting the possibility of parallel corpus generation and model training using only a single language corpus.
Verifying the effectiveness of the distillation process using softmax activation.
Limitations:
Further research is needed to determine the generalizability of the presented methodology to other low-resource languages and diverse language pairs.
A more in-depth analysis of the impact of the quality of synthetic parallel corpora on final translation performance is needed.
Further experiments are needed to determine the effect of the type and size of LLM used on the results.
Performance evaluation using evaluation indicators other than chrF score is needed.
👍