Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Created by
  • Haebom

Author

Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu

Outline

This paper proposes LMTransplant, a novel text augmentation paradigm leveraging large-scale language models (LLMs). LMTransplant aims to generate diverse and creative transformations at the content level by leveraging the knowledge of LLMs, rather than simply transforming at the lexical level like conventional backtranslation. This is achieved through a "transplant-regeneration" strategy: integrating the source text into the context augmented by the LLM and then having the LLM generate the transformed text. Experimental results demonstrate that LMTransplant outperforms existing methods and demonstrates excellent scalability as the augmented data size increases.

Takeaways, Limitations

Takeaways:
Leveraging LLM to overcome the limitations of existing text augmentation methods and to present a new method for generating diverse and creative variations at the content level.
LMTransplant demonstrates superior performance and scalability compared to existing methods.
Demonstrates that LLM knowledge can be effectively utilized to improve the quality of text augmentation.
Limitations:
There is a possibility that the performance improvements of the proposed method may be limited to specific datasets or tasks.
LLM's output can be difficult to control and may be highly dependent on prompt engineering.
Effective use of LMTransplant may require sufficient computing resources.
👍