Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Created by
  • Haebom

Author

Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu

Outline

This paper proposes LMTransplant, a novel text augmentation paradigm leveraging large-scale language models (LLMs). To overcome the limitations of existing text augmentation methods, which primarily focus on lexical-level transformations and thus lack the diversity of transformations while maintaining meaning, LMTransplant integrates the source text with the extended context generated by the LLM and then has the LLM regenerate the transformed text. This allows the LLM to leverage its inherent knowledge to generate more diverse and creative content-level transformations while preserving the core properties of the source text. It outperforms existing methods on a variety of text-related tasks and demonstrates excellent scalability as the augmented data size increases.

Takeaways, Limitations

Takeaways:
We propose a new text augmentation method using LLM to overcome the limitations of existing methods.
Ability to create diverse and creative text variations at any content level.
Preserve core properties of the original text.
Excellent scalability as augmented data size increases.
Outperforms existing methods in various text-related tasks.
Limitations:
This paper may not provide detailed information on the specific types and sizes of LLMs, or prompt engineering strategies.
Only performance evaluation results for specific tasks are presented, so further research is needed to determine generalizability to other tasks.
Since it depends on the performance of LLM, the limitations of LLM may also affect the performance of LMTransplant.
👍