Share
Sign In
프롬프트 프로젝트
KLEAR Textbook 교과서 Activity Tool 제작 프로젝트
S
Sujin_Kang
👍
1
더 프롬프트 컴퍼니는 2024년 1월부터 한국어 교육을 위한 AI Activity 툴과 챗봇을 제작하고 있습니다.
이 프로젝트는 미국 UH Press 출판사, University of Hawaii at Manoa, EALL의 한국어과가 함께하고 있습니다. 2024년 봄학기에는 하와이 대학교의 초급 한국어를 수강하는 학생을 대상으로 툴과 챗봇의 파일럿 테스트를 마쳤는데요, 1차 프로젝트에서는 AI 툴과 챗봇 48개를 제작했습니다.
프롬프트만으로 LLM의 한계를 보완하고 학생의 수준에 맞춘 의미있는 학습 도구를 제공할 수 있다는 성과가 있었습니다.
이 글에서는 1차 프로젝트에 대한 소개, 프로젝트 배경, 프로젝트 진행 방법, 설문 조사를 담은 연구 내용을 담았습니다. 2024년 미국 인디애나 주 Indiana University 에서 열린 제29회 AATK 학회에 채택되었습니다.
🧷
Abstract
Enhancing Korean Language Learning through AI and Chatbots: An In-Depth Study of Efficacy, User Experiences, and Challenges
The incorporation of Artificial Intelligence (AI) has become widespread in the field of language learning and teaching over the past decade. One form of AI, the chatbot, has gained popularity in language education for its ability to facilitate student learning in various aspects. This study focuses on evaluating the impact of these technologies on the Korean language learning experience, particularly at the beginner level. Its aim is to explore the effectiveness of generative AI in facilitating language acquisition, with a special emphasis on learners' engagement and perceptions.
Empirical studies by Zhang and Aslan (2021), Schmidt and Strasser (2022), Jeon (2021), and Aihua (2021), assert the potential benefits of AI chatbots in language learning. Language learning applications like Duolingo and Babbel utilize AI to provide personalized feedback, serving as valuable adjunct tools in the language learning experience. However, applying generative AIs to language learning presents unique challenges due to their peculiar characteristics. Therefore,
this study developed AI tools and chatbots utilizing the advanced capabilities of ChatGPT-3.5 and ChatGPT-4 models, tailoring them specifically for Korean language learning contexts.
The current study involves the development of 48 unique AI tools and chatbots using advanced prompt engineering techniques. Designed to reinforce key language elements such as vocabulary, sentence structure, and verb conjugations, these chatbots were programmed to simulate real-world conversations, mirroring the scenarios presented in each textbook lesson. This functionality provides real-time feedback and correction on grammatical errors and vocabulary misuse. The primary aim is to create an immersive learning environment where learners can apply and test their language skills in practical, conversational contexts.
Using the pilot test of these chatbots, a detailed user survey was conducted with 50 learners at a university in the US who had prior experience in learning Korean. This survey sought to understand various aspects of their learning process, including their firsthand experiences after using the specifically chosen AI tools and chatbots for this survey.
According to the survey results, 70% of participants affirmed the utility of AI supplementary tools in their language progress. The conjugation tool, in particular, received high praise for its effectiveness in practicing grammar and vocabulary in a conversational setting. Notably, more than half of the participants acknowledged that AI tools positively influenced their understanding of Korean, especially in recognizing the subtle differences between Korean and English and appreciating the intricate details of language learning. However, the study also highlighted significant limitations and challenges. A notable issue was the occurrence of AI-generated inaccuracies or 'hallucinations', observed in about 20% of the cases. Participants reported instances where the AI provided incorrect sentence structures or tense forms, underlining its imperfection as a learning aid. The AI tools also struggled with nuances, specific contexts, and appropriate situational responses.
The findings suggest that while AI can be a powerful aid in language learning, educators and developers must address its current limitations and ensure learners are well-informed about these technologies.
Results and Insights:
1.
Positive Feedback:
70% of participants affirmed the utility of AI supplementary tools in their language learning progress. The conjugation tool was particularly praised for its effectiveness in practicing grammar and vocabulary in conversational settings.
2.
Improved Understanding:
More than half of the participants acknowledged that AI tools positively influenced their understanding of Korean. They noted an enhanced ability to recognize subtle differences between Korean and English and appreciated the intricate details of language learning facilitated by these tools.
3.
Challenges and Limitations:
Despite the positive feedback, significant limitations were also highlighted. Approximately 20% of the cases experienced AI-generated inaccuracies or 'hallucinations', where the AI provided incorrect sentence structures or tense forms. Additionally, the AI tools struggled with nuances, specific contexts, and appropriate situational responses.
The findings from this project suggest that while AI-powered tools can significantly aid language learning, it is crucial to address their current limitations. Educators and developers must ensure learners are aware of these limitations and provide guidance on how to use the AI tools effectively.
Methodology and Approach for creating AI-powered Activity Tools
Development Process:
1.
Selection of LLM Models:
The study utilized advanced capabilities of ChatGPT-3.5 and ChatGPT-4 models, chosen for their superior natural language processing capabilities.
2.
Prompt Engineering:
A total of 48 unique AI tools and chatbots were developed using advanced prompt engineering techniques. This process involved designing specific prompts that could elicit desired responses from the AI, ensuring that the tools could simulate real-world conversations and provide meaningful feedback to learners.
3.
Alignment with KLEAR Textbooks:
4.
This alignment ensured that the chatbots reinforced key language elements presented in the textbooks, such as vocabulary, sentence structure, and verb conjugations.
5.
Stimulation of Real-World Conversations:
The chatbots were programmed to mirror the scenarios presented in each textbook lesson, creating an immersive learning environment. They provided real-time feedback and correction on grammatical errors and vocabulary misuse, allowing learners to apply and test their language skills in practical contexts.
Challenging Points for Prompt Engineering
1.
단어 생성 컨트롤 Beginner-level words and sentences need to be generated, but LLMs cannot adjust the level. They often produce sentences that are too difficult for beginner Korean learners to understand.
2.
문장 레벨 컨트롤 Based on the textbook's dialogue scenarios, similar problems need to be generated for students, but control over this is lacking.
3.
일관성 유지와 맥락 유지 It was challenging to maintain consistency with the curriculum, and generate content that is appropriately tailored to the learners' proficiency levels.
Prompt Engineering 을 통한 해결방법
1️⃣ Few-shot prompt enginering
By using few-shot prompt engineering, I can explicitly instruct the LLM to generate content at a beginner level. For instance, you can include specific instructions in the prompt, such as "Create simple sentences with basic vocabulary suitable for beginner Korean learners." Additionally, you can provide examples of the type of sentences you want to ensure the model understands the desired complexity level. In the following example, I can restrict the set of words the LLM needs to generate.
Prompt Example. Lesson 2. 어때요?
“””You are an expert Korean language teacher. Your task is to generate extremely simple questions based on the {{형용사}} inputted by the user. Limit your response to less than 6 words. Respond in Korean. Include the English translation at the end of your response. (Example) U: 맛있다. A: 학교 식당 커피가 어때요? 맛있어요. How is the school coffee? Delicious. ### U: 많다. A: 한국어 수업 숙제가 어때요? 재미있어요. How is your Korean homework? Fun. ### U: 좋다. A: 요즘 기분이 어때요? 좋아요. How are you feeling? Good. ###
{형용사}로 된 부분에는 Word Bank를 만들어 단어를 주어진 리스트 안에서만 생성하도록 했습니다. 레벨 밖 단어를 생성하지 못하도록 한 것인데, 제법 잘 컨트롤 합니다.
2️⃣ Chain-of-thought Prompting Technique
I found a specific problem that LLMs often produced sentences that were too difficult for beginner learners of Korean. I solved this issue with chain-of-thought prompt technique. Using this technique, you guide the model step-by-step to ensure it generates simple sentences suitable for beginners. By explicitly instructing the model to think through each step, the complexity of the output can be controlled. In the following example, you can check that tasks were breaking down into manageable steps and guiding the model through the process. In the output, it aligned with textbook content and scenarios.
Prompt Example. Lesson 2. Describing People” Practice Bot
""" You are an expert Korean language teacher. #STEP ONE: After the user opens the conversation with "네!" or "Let's go!" or "I'm ready," your first task is to generate a table of information in Korean about an imaginary student. Your STEP ONE response should be in Korean. Do not include translations. Each field must not exceed 15 characters. Do not include any explanations. Do not respond in full sentences. Create a list of random characteristics that includes the following information: **— Name: School Year: Nationality: Major: Relationship (With user): Hometown: —** ##STEP TWO: When the user responds with their paragraph, your task is to point out any grammar mistakes and offer a corrected version. Your grammar corrections should be in English, but keep the rest of the interaction in Korean. After you provide corrections, start the cycle over by offering a new unique set of characteristics. Do not repeat any of the information you generated previously. Your STEP TWO response should be in English for the corrections and Korean for the new table. Repeat this cycle until the user closes the conversation. Do not use only Korean names, use other countries' names as well. ###### The following are examples of correct interactions. In these examples, U is the user and A is you: **U: "네! Let's go!"** **A:** Name: 나타샤 School Year: 3학년 Nationality: 이탈리아 사람 Major: 경영학 Relationship (With user): 친구 Hometown: 로마 ### STEP THREE: Now, please turn this table into a paragraph in Korean, and I'll check your grammar.** **U: "제 친구의 이름은 나타샤예요. 나타샤는 3학년이에요. 나타샤은 이탈리아 사람잉에요. 나타샤의 전공은 경영학이에요. 나타샤는 로마에서 왔어요."** - *A: "That looks great! Just a few notes: —Grammar Mistakes: 1. 나타샤은 → 나타샤는 2. 사람잉에요 → 사람이에요 —Corrected Paragraph: 제 친구의 이름은 나타샤예요. 나타샤는 3학년이에요. 나타샤는 이탈리아 사람이에요. 나타샤의 전공은 경영학이에요. 나타샤는 로마에서 왔어요. —New Set of Characteristics: Name: 루카 School Year: 2학년 Nationality: 일본 사람 Major: 의학 Relationship (With user): 동료 Hometown: 도쿄" """
프롬프트 엔지니어링 전략
1.
Structure Prompts
LLM 이 이해하기 쉬운 형태로 프롬프팅을 구조화한 것이 포인트 입니다.
Heading 과 Bullet Points(해시태그 사용), Numbered lists 를 사용하여 문장의 요소를 구분 했습니다. AI가 이해하기 쉬운 형태로 task 를 전달하여, 목적에 맞는 결과를 얻을 수 있습니다.
1.
Iterative Refinement
LLM이 해야 할 task 단계를 세 개로 나누어, 반복적으로 처리하게 합니다. 이렇게 반복하게 하면, AI 답변에서 실수를 줄이고 잘못된 결과 출력을 막을 수 있습니다. 무엇보다 목표에 딱 맞는 결과를 얻을 수 있습니다.
1.
Contextual Relevance and Diversity
LLM의 목표를 정확하게 한정하여 문맥을 제어할 수 있습니다. Korean grammar를 가르치는 것을 목표로 하고, 프롬프트의 목적을 사용자의 문법 오류를 수정하고 설명하도록 했습니다. 또, 템플릿을 사용하여 같은 학습 레벨을 학습 할 수 있도록 했습니다.
Subscribe to 'sujin-prompt-engineer'
안녕하세요,
슬래시페이지 구독을 하시면, 이따금씩 발행하는 프롬프트와 프롬프트 엔지니어링에 관한 글을 이메일로 받아보실 수 있어요. 구독하시겠어요? 😊
Subscribe
👍
1
Sujin_Kang
프롬프트 평가 자동화를 위한 연구
휴먼 작업자의 대화데이터 레이블링 LLM에 input(프롬프트)을 넣고 답변을 평가하기 위한 정량적 벤치마크는 많습니다. Archive에는 이런 metrics들이 쏟아져 나옵니다. 그런데, 이들은 대부분 "정답"이 있는 질문에 대해 언어 모델이 얼마나 답을 정확하게 맞췄느냐에 중점을 둡니다. 수학, 산술, 일반 상식 문제 같은 것들이요. 하지만 사용자의 프롬프트는 답이 없는 경우가 많아요. "정성적"인 접근이 필요해요. 어떤 모델의 답변을 좋다고 할 수 있는지, 좋다면 왜 좋은지, 그 기준은 무엇인지를 평가해야해서 어려운 점이 많습니다. 그래서 신뢰할만한 정성적인 메트릭은 찾기 어렵습니다. ✅ "정성적"인 메트릭스 연구 프롬프트 평가 자동화 연구를 한창하고 있습니다. 정답은 언어 모델의 결과를 받은 사용자(end-user)가 얼마나 만족하고 불만족하는지에 있다고 생각해요. 생성형 AI가 대화형 인터페이스이기 때문에, Turn의 구조를 보면 알 수 있는 것들이 많아요. ✅ 대화 분석학 선호/비선호 구조 (preferred and dispreferred organization) 사용자가 언어 모델의 답이 마음에 들었으면, preferred 구조를 마음에 들지 않았으면 dispreferred organization 의 턴 구조가 확연이 드러나요. Explicit 한 언어로 말이죠. 그럼, 만족/불만족하게 한 원인이 무엇일지 대화 상황에서 찾아보는 것으로 메트릭을 잡을 수 있습니다. ✅ LLMs vs 인간의 프롬프트 답변 평가 메트릭을 가지고, 각 프롬프트와 결과값을 평가하는 단계인데요. 예를들어 100개의 대화 데이터셋이라면, 10개의 메트릭을 두고 LLM과 인간이 평가하는거예요. 이 과정에서 나누고 싶은 경험이 있습니다. LLM은 몇 회에 걸쳐 평가를 하더라도, 자기 일관성이 뛰어납니다. 시간도 사람보다 절대적으로 빠릅니다. 그런데, 사람은 한 번 채점하고 두 번 했을 때 자기 일관성이 매우 떨어져요. 프롬프트 자동화 메트릭을 만들던 초기에는, 인간이 무조건 LLM보다 뛰어나다라고 믿었습니다. 네 명의 친구에게 평가 작업을 시켜봤습니다. 이들이 평가한 작업량은, ✔ 900개 턴(single turn/multi turn포함, 약 17,000개)* LLM 3종류 = 51,000 문장입니다 😢 네 명 중 세 명은 중도 포기를 했고, 한 명만 전체 분량의 50% 를 완료했는데, 결과가 처참했습니다. 아무래도 막 점수를 매긴 것이 아닌가해요...그래프를 보시면, 사람 간에도 일관성이 떨어지고, 모델이 평가한 것과 사람간의 결과에도 일치하지 않습니다. 색이 진할 수록 일관성이 떨어짐을 의미하는 그래프 입니다.
👍😀
3