Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

Created by
  • Haebom

Author

Anran Li, Lingfei Qian, Mengmeng Du, Yu Yin, Yan Hu, Zihao Sun, Yihang Fu, Erica Stutz, Xuguang Ai, Qianqian

Outline

This paper presents the first comprehensive evaluation of data memorization in large-scale language models (LLMs) in the healthcare field. We systematically analyzed three common adaptation scenarios—continuous pretraining on a medical corpus, fine-tuning on a standard medical benchmark, and fine-tuning on real clinical data, including over 13,000 hospitalization records from the Yale New Haven Health System—to assess the frequency, nature, amount, and potential impact of memorization in LLMs. Results show that memorization occurs at a significantly higher frequency in all adaptation scenarios than in the general domain, suggesting implications for the development and adoption of LLMs in healthcare. Memorized content is categorized into three types: informative (e.g., accurate reproduction of clinical guidelines and biomedical references), uninformative (e.g., repetitive disclaimers or formulaic medical document language), and detrimental (e.g., reproducing dataset-specific or sensitive clinical content). We offer practical recommendations to promote beneficial memorization, minimize uninformative memorization, and mitigate detrimental memorization.

Takeaways, Limitations

Takeaways:
Provides a comprehensive analysis of the frequency, characteristics, quantity, and impact of data memorization in medical LLMs.
Classify memorization types into three categories: beneficial, uninformative, and harmful, and clearly present their characteristics.
Emphasize the importance of memorization in the development and application of medical LLMs.
Practical recommendations are presented to promote beneficial memorization, minimize non-informative memorization, and mitigate harmful memorization.
Limitations:
Further research is needed to determine whether the characteristics of the dataset analyzed in this study (e.g., features of Yale New Haven Health System data) can be generalized to other medical datasets.
Further research is needed on quantitative measurement and assessment methods for memorization phenomena.
Further research is needed to examine differences in memorization across different LLM architectures and training methods.
👍