Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Rethinking Data Protection in the (Generative) Artificial Intelligence Era

Created by
  • Haebom

Author

Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren

Outline

This paper argues that existing concepts of data protection have become inadequate due to the significant shift in the meaning and value of data in the era of generative AI. The critical role data plays throughout the AI lifecycle highlights the need to protect diverse forms of data, including training data, prompts, and outputs. To address this, this paper proposes a taxonomy comprised of four levels—unusability, privacy, traceability, and erasure—to capture the diverse data protection needs of modern generative AI models and systems. This framework facilitates a structural understanding of the tradeoffs between data usability and control across the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. It also analyzes representative technical approaches at each level and identifies regulatory blind spots that expose critical assets. Ultimately, this paper provides a structural framework for aligning future AI technologies and governance with trustworthy data practices, providing timely guidance to developers, researchers, and regulators alike.

Takeaways, Limitations

Takeaways:
Presenting a new perspective on data protection in the era of generative AI and providing a structural understanding through a four-level classification system.
Analyze the tradeoffs between data usability and control and present data protection strategies across the AI pipeline.
Expose regulatory blind spots and provide timely guidance for developers, researchers, and regulators.
Limitations:
Further research is needed to determine the practical applicability and effectiveness of the proposed classification system.
This may be limited to a representative case study rather than a comprehensive analysis of various AI models and systems.
Lack of detailed description of technical approaches or potential bias towards specific technologies.
👍