English
Share
Sign In
📚

Information extraction

Language models are capable of performing a variety of natural language processing (NLP) tasks, such as classification, information extraction, and summarization, in addition to simply generating text. These models learn patterns from large amounts of text data and use them to generate new text or analyze the meaning of existing text.
For example, you can think of a task of extracting important information from a specific text. Using a language model, you can identify and extract important elements from the text, such as specific dates, places, and names of people. You can also analyze the topic or sentiment of the text. This can be useful in various fields such as deriving business insights, summarizing news, and analyzing social media.
Furthermore, these models can be used to answer user questions or understand and execute complex instructions, such as asking for a summary of a recent news article or research paper, or finding information on a specific topic. For example, let's say you have the following information:
South Korea's entertainment agency 'Management Soop' was founded in 2011 by CEO Kim Jang-gyun, who previously worked at SidusHQ and Fantagio. In December 2018, Kakao M acquired a 99.36% stake for 14 billion won. At the time, Kakao acquired BH, Jaywide, and Ready Entertainment with the aim of strengthening its entertainment business through Kakao M, and this was an actor agency that was acquired along with them. For reference, Soop Entertainment's 2017 business performance, which was disclosed at the time of acquisition, was sales of 18.486 billion won and net income of 308 million won. (Total assets of 4.911 billion won, total liabilities of 1.93 billion won) The affiliated artists are Gong Yoo, Gong Hyo-jin, Kim Min-joo, Kim Jae-wook, Kim Ji-soo, Nam Joo-hyuk, Nam Ji-hyun, Seo Hyun-jin, Suzy, Lee Chun-hee, Jang Sung-hoon, Jeon Do-yeon, Jeon Hye-jin, Jung Yu-mi, and Jung Ga-ram. Previously, there were Kim Min-hee, Kim Tae-geum, Ryu Seung-beom, Soo Ae, Yoo Min-kyu, Lee Jae-joon, Jung Il-woo, Choi Woo-shik, and Jeon So-ni.
There is a complex mix of information about Management Forest. Information about the founder, information about the artists under their management, information about the shares, etc. If we assume that the information we want is information about the artists, we can request it like this.
Prompt : 위에 제시된 정보 중, 해당 회사의 소속된 연예인 목록을 뽑아줘
🤖
Let's extract the list of affiliated celebrities from the given data:
Gong Yoo, Gong Hyo Jin, Kim Min Joo, Kim Jae Wook, Kim Ji Soo, Nam Joo Hyuk, Nam Ji Hyun, Seo Hyun Jin, Suzy, Lee Chun Hee, Jang Sung Hoon, Jeon Do Yeon, Jeon Hye Jin, Jung Yu Mi, Jung Ga Ram
This is a list of celebrities affiliated with 'Management Forest'.
This is a very simple example, and if you give detailed commands such as 'organize it into a table' or 'organize it using numbering', you can extract the data in the desired format. In fact, in the field of natural language processing (NLP), since the advent of LLM, research results have been published that show that LLM is more capable than NLP in many areas.
If we make good use of this, we can use this characteristic to summarize information from documents such as PDFs that we often use, or to organize unorganized data by giving it to us. (Extracting and summarizing information)
In particular, it can be very useful when extracting and organizing various pieces of information in the form of text, such as shopping mall review data or newspaper articles.
🖨️
💬
ⓒ 2023. Haebom, all rights reserved.
It may be used for commercial purposes with permission from the copyright holder, provided the source is cited.