English
Share
Sign In
Limitations and directions for improvement of large language models (LLM) seen from Google AI search errors
Haebom
What I always say to people who ask me about artificial intelligence is, “Artificial intelligence = not LLM, and LLM is not all-powerful.” But there doesn't seem to be much interest. I earned an LLM in the field of marketing, and the hottest field right now is the LLM, so it was actually a weak cry, but after an error occurred as a result of a Google search, I personally think that some of the exaggerated expectations were eliminated and there was something to think about.
Recently, Google's new AI-based search feature "AI Overviews" caused a stir when it provided incorrect information. Many people are pouring out provocative comments about this, predicting the fall of Google's search empire.
However, it is important to determine the underlying cause of this incident and understand why similar errors may occur in other AI systems. I think a more productive way for us to talk about this is not to simply criticize, but to look at the cause of Google AI search errors, and prepare for the fact that the same problem may occur in other AI systems such as OpenAI, Grok, and LLaMA.
In fact, there are posts on Reddit and other sites that say Grok and LLaMA's method of intentionally eliciting similar answers has been successful. (The key point is that this is a problem that the “creative” model must overcome, not simply flaw other models.)
1. Inherent limitations of large language models (LLM)
Large language models generate text by learning from massive amounts of data from the web. Although it is excellent at generating contextually natural answers, it has the following fundamental limitations:
Bias in training data : LLM is trained with public data from the internet. However, although there is reliable information on the Internet, there is also a lot of incorrect information. This forces the model to learn inaccurate information and ultimately increases the likelihood of giving incorrect answers.
Lack of context understanding : Because LLM operates based on statistical relationships between words and sentences, it is difficult to achieve in-depth understanding like humans. This often leads to misunderstanding the context or leading to incorrect conclusions.
2. Hallucination phenomenon
In the field of AI, the phenomenon of LLM generating incorrect information is called “hallucination.” Hallucinations occur when the LLM creates strange content that does not fit the context or presents information that is different from the facts.
Incorrect token prediction : LLM generates text by predicting which token is most likely to appear next. However, if inappropriate tokens are selected during this process, absurd content may appear.
Learning from misinformation : LLM even accepts misinformation contained in training data as is. For example, you may take an Internet joke or an unfounded article as truth.
3. Limitations of Retrieval-Augmented Generation (RAG)
RAG is a proposed method to increase the accuracy of LLM. This method first searches for related documents and then generates text based on them. It is a method that has been mentioned a lot recently as a way to increase accuracy, but it is a methodology that existed in existing search operations. In some ways, it is a proven method, so its effectiveness is certain, but this method also has limitations.
Inappropriate document search : RAG finds documents based on vast amounts of data on the web. At this time, documents containing inaccurate information may be retrieved, which in turn leads to incorrect answers.
Error in context interpretation : Although LLM generates text based on retrieved documents, the context of the document may still be misunderstood .
Google AI search error case
Google's AI Overviews feature clearly demonstrated these limitations of LLM. AI Overviews attempts to directly answer user questions, but often provides inaccurate or misleading information. This leads to the following problems: There are two main errors that are currently spreading on the Internet. (screenshot above)
Suggestion to put glue on pizza
Providing incorrect information about using glue to properly stick cheese to pizza
Recommendation to eat stones
Giving incorrect health advice to eat a small stone every day.
Google's AI search failure is not an anomaly. Other AI systems, including OpenAI's ChatGPT, Grok, LLaMA, and Claude, may face similar problems. Ultimately, this stems from the inherent limitations of LLM technology itself. This was revealed to Google in the process of incorporating it into its services. (In a way, it could be said that it was TDD), but there have already been incidents like chatGPT's King Sejong MacBook throwing incident.
Errors in Google AI search result from inherent limitations of LLM and problems with training data. This problem can commonly occur not only in Google, but also in many AI systems such as OpenAI, Grok, and LLaMA. Finding a solution will require securing high-quality training data and developing models with improved context understanding capabilities. In addition, it is worth exploring ways to increase the accuracy of information by using technologies such as RAG. In fact, as mentioned above, in this case, the document itself as a reference in the RAG method is the problem.
The important thing is not to simply criticize this problem, but to identify the root cause and come up with solutions. Only then will we be able to showcase a more advanced AI search function. (Even if it's not necessarily Google) We can give you a few more thoughts here. In the future, RAG based on verified information will become more important. Where can I find preprocessed and verified information? Same thing.
3
😍
1
/haebom
Subscribe