[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Created by
  • Haebom

Author

Tong Chen, Faeze Brahman, Jiacheng Liu, Niloofar Mireshghallah, Weijia Shi, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi

Outline

This paper proposes Paraphrase Preference Optimization (ParaPO), a post-training method, to solve the problem of language models (LMs) that repeat the content of pre-training data as it is. ParaPO trains LMs to paraphrase the content they have memorized instead of outputting it as it is. We also propose a ParaPO variant that utilizes system prompts to enable appropriate use of famous quotes. Experimental results on Llama3.1-8B and Tulu3-8B models show that ParaPO is more effective than conventional unlearning methods in reducing repetition of memorized content while maintaining the usability of the model. In particular, ParaPO utilizing system prompts shows that it is effective in reducing unwanted repetition of content while maintaining the ability to remember famous quotes.

Takeaways, Limitations

Takeaways:
We present a new method (ParaPO) that effectively solves the problem of repeating the contents of pre-training data (regurgitation).
It shows better performance than existing unlearning methods.
You can use system prompts to control content repetition in certain situations.
Contributes to solving copyright, plagiarism, privacy, and creativity issues.
Limitations:
The effectiveness of ParaPO may be limited to certain models and datasets. Additional experiments on various models and datasets are needed.
How to utilize system prompts can be highly dependent on prompt engineering.
ParaPO may not be able to perfectly solve all types of regurgitation.
👍