Sign In

How to extract the data you want from AI

Haebom
When you read the novel Harry Potter, you come across a concept called Legilimency . It's a magic that reads minds and memories, and it appears as the most important magical element in the latter half. (A device that shows the roots of each character's thoughts or past.) Later on, you can use it without a wand, just by making eye contact... Well, separately from that, there's something called Legilimency in artificial intelligence. In fact, if you think about it a little, it's a really good ability, but it's also a bit creepy. Reading someone's thoughts and mind doesn't seem like a very happy thing. At first glance, it might seem cool that artificial intelligence can extract any data it wants, but what if that information is someone's personal information or confidential information? I don't even want to imagine it.
<신비한 동물사전> 시리즈의 퀴니 골드스틴(앨리슨 수돌)이 Legilimency의 장인으로 나옵니다.
Artificial intelligence—specifically what we call large language models (LLMs)—store and generate data in ways that are almost like black boxes, making them very hard to control. That's why there's active research on ways to control them so they can deliver the specific or precise answers we want.
A recent study called <Scalable Extraction of Training Data from (Production) Language Models> discusses ways to extract desired or intended data from LLMs. This research is a collaboration among various experts including DeepMind at Google, the University of Washington, and Cornell University, so it's not a paper published to specifically target Google.
The likelihood of personal information leaking from a language model can vary according to a number of factors, depending on what the model learned from its training data. Language models are trained on countless data sources, and if certain patterns or pieces of data are fed repeatedly into a model, the model can end up 'remembering' them and using them during generation. This is known as 'memorization'.
Even if a model does 'remember' a specific piece of data, whether that data actually gets output and leaked depends on the situation. For example, data containing sensitive info could unintentionally come out through the model's responses to certain prompts or questions. But these incidents vary with the model's design, training, and deployment, and don't necessarily happen in the same way across all language models or all kinds of data.
The study shows that data can be extracted from not just open or semi-open source language models like LLaMA2 and Falcon, but also from closed models like ChatGPT. The researchers discovered that existing techniques didn't work on the aligned version of ChatGPT, so they developed a new 'divergence attack', enabling ChatGPT to extract training data at a rate 150 times higher than under its normal operation.
Scalable Extraction of Training Data from (Production) Language Models.pdf1.37MB
To prevent this from happening again in the future, the researchers shared the discovered vulnerabilities with the teams responsible for those models and published this paper. (This is what real research ethics means.) They strongly insist that making these findings public is necessary to direct attention to data security and the alignment problems of AI models. This is a limitation of current language models, and a challenge that must be overcome.
To be clear, this is not something to be misunderstood. Just putting personal information into an LLM doesn’t mean that it will always be leaked. But if you repeatedly enter the same personal info, or train and deploy a language model for applications where personal data protection is crucial without proper safeguards, then personal data can indeed be exposed.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
haebom@kakao.com
Subscribe